<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Posts on mht.wtf</title><id>urn:uuid:da7bf17d-b153-4d31-b566-8da79efa6b88</id><updated>2026-03-15T15:09:41.583841164+01:00</updated><link href="https://mht.wtf/" rel="alternate"/><link href="https://mht.wtf/post/feed.xml" rel="self"/><entry><title>The Problem with OOP is &quot;Oriented&quot;</title><id>https://mht.wtf/post/oop-oriented/</id><updated>2020-05-16T17:33:35+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/oop-oriented/" rel=""/><link href="https://mht.wtf/post/oop-oriented/index.html" rel="alternate"/><published>2020-05-16T17:33:35+02:00</published><content type="text/html">&lt;p&gt;The problem with OOP isn&apos;t the &amp;quot;object&amp;quot; part, it&apos;s really the &amp;quot;oriented&amp;quot; part.&lt;/p&gt;
&lt;p&gt;My problem with OOP isn&apos;t really that I despise design patterns, nor that I like all of my data to be in global scope, which OOP at least discourages,
nor that I&apos;m convinced that all functions should be pure and any state is the root of all evil.
My problem is simply that it doesn&apos;t really solve the problems I have when I&apos;m programming.
It does little to solve the problems I &lt;em&gt;do&lt;/em&gt; have, and it imposes limitations and shepherds me into getting problems that I &lt;em&gt;didn&apos;t have before&lt;/em&gt;.
Even calling this a trade-off is generous, since a trade would imply that I actually am getting something out of it.&lt;/p&gt;
&lt;h2&gt;How I Program&lt;/h2&gt;
&lt;p&gt;It is very rare that I know up front exactly what I&apos;m making and how it will look in the end.
Most of the time I have some high level idea of what should happen in my program, and I need to figure out how to integrate this in the current codebase.
This means understanding what data I need to transform and how to organize this data.
Sometimes this is straight forward, and sometimes this reveals that my high-level idea of the problem was wrong, which means I have to go back and adjust my high level idea,
this time with more information in mind.
This process goes back and forth, and in the end my program might look nothing like my first attempt.&lt;/p&gt;
&lt;p&gt;I think most programmers experience this often.&lt;/p&gt;
&lt;p&gt;Since I have to go back and forth a lot when developing, most of my code will not make it far.
This is a crucial counterexample to the meme that &amp;quot;code is written once but read a thousand times&amp;quot;.
Most of the code I write is probably just read a couple of times (and only by me!), before it&apos;s replaced by new code that better solves the problems I&apos;m having.
Therefore, spending time on this code is very often a complete waste of time.&lt;/p&gt;
&lt;h2&gt;A Lack of Upsides&lt;/h2&gt;
&lt;p&gt;In OOP it&apos;s central to make class hierarchies with methods that the subclasses override, and having the callers of these methods be agnostic to the actual class of the objects they are operating on.
Having had some experience in languages in which this is either discouraged or simply not possible, I&apos;ve come to the conclusion
that having a superclass that defines methods and a number of different subclasses with which the methods are automatically dispatched, is really not something I need that often.
Maybe I can come up with an example where this is actually a good solution, given some time I probably can,
but this comes back to the second O of OOP: &amp;quot;Oriented&amp;quot;.&lt;/p&gt;
&lt;p&gt;The case where I actually want dynamic dispatch is &lt;em&gt;very&lt;/em&gt; rare, and so having the entire programming language (or worse, your codebase) be &lt;em&gt;oriented&lt;/em&gt; around this concept does not make any sense.
Similarly, most of my types are significantly different enough for it not to make sense having code agnostic to the types it&apos;s operating on.
It just doesn&apos;t really happen that often.&lt;/p&gt;
&lt;p&gt;Similarly, having intentionally been very generous with visibility qualifiers in my code,
I cannot think of a single time where a member being visible has caused any problems.
On the other hand, I remember vividly a case where a 3rd party library had messed up the qualifiers on a tuple-like type
containing the coordinates of a mouse click, which made it impossible to get them.
Try to think back: when was the last time you tried to access a field or method, only to get told that it&apos;s private,
and then being forced into a setter/getter with other logic which saved you from a bug?&lt;/p&gt;
&lt;h2&gt;It&apos;s &amp;quot;Pay up front&amp;quot;&lt;/h2&gt;
&lt;p&gt;OOP is about structuring your program according to certain principles.
That is, in of itself it doesn&apos;t to any work.
Thus, the time you spend on making your codebase adhere to OO principles is time spent not making your program do what it needs to do,
and thus wasted. That is, unless it pays back later.&lt;/p&gt;
&lt;p&gt;I think what often brings people into OOP is the idea that the work you spend on having a good class structure, with subclassing and getters/setters and proper encapsulation,
will pay back in dividends during the lifetime of the project by avoiding code duplication, maintaining invariants on a class&apos; private state, help debugging, making your code more flexible, and a bunch of other things.&lt;/p&gt;
&lt;p&gt;So by spending time designing class structures, figuring out which fields are properly encapsulated, and making sure methods are
abstracted sufficiently high enough up in the hierarchy, you&apos;re spending time in the hopes that this will make it easier to work
with down the line.&lt;/p&gt;
&lt;p&gt;You are, effectively, paying the price to fight a problem you &lt;em&gt;think&lt;/em&gt; you &lt;em&gt;might&lt;/em&gt; get further down the line,
and you &lt;em&gt;hope&lt;/em&gt; that the price, which you definitely paid, was sufficient for these problems to not get out of hand.&lt;/p&gt;
&lt;h2&gt;Encapsulation by default&lt;/h2&gt;
&lt;p&gt;I&apos;m also becoming increasingly vary to complete encapsulation.
At it&apos;s essence, encapsulation is about only exposing the minimal subset of the members (data or methods) of a class that the caller needs.
This sounds rather reasonable and sometimes it &lt;em&gt;is&lt;/em&gt; a good idea.
However, it also opens up an extremely difficult problem for the programmer,
because they now have to decide exactly which subset of their members are sufficient for &lt;strong&gt;any possible&lt;/strong&gt; use-case of that class.
When marking a field &lt;code&gt;private&lt;/code&gt;, the programmer is really saying that there is no possible valid program which
needs to access this field, whatsoever.&lt;/p&gt;
&lt;p&gt;I genuinely think hidden by default is the wrong choice.
You shouldn&apos;t have to make the case to have a function or data members visible, because now the programmer
will have to imagine a valid use-case in which having this member visible is necessary, and there will
always be use-cases that they miss.&lt;/p&gt;
&lt;h2&gt;It&apos;s Object &lt;em&gt;Oriented&lt;/em&gt;&lt;/h2&gt;
&lt;p&gt;The mere presence of dynamic dispatched method calls doesn&apos;t make a whole codebase OO.
Batching together data into something that looks like a class certainly does not make the codebase OO.
Having subclasses does not make the codebase OO.
For a codebase to be Object Oriented, it really has to be &lt;em&gt;oriented&lt;/em&gt; around objects.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;I don&apos;t this the OO mindset is without merits, but I think the gains are sufficiently far and few in between that having your codebase be oriented around objects is a mistake.
Rather, I think the codebase should be oriented around the data that your programming is operating on;
after all, everything a program does is transforming input data to output data, and the codebase should reflect this.&lt;/p&gt;
&lt;p&gt;At last, don&apos;t start quoting Alan Kay; I know his objections to what we now call OOP, and if you&apos;ve made it this far I&apos;m sure you understand that I&apos;m not talking about his idea of it.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
</content></entry><entry><title>IAM, shortcuts, and hot-reload</title><id>https://mht.wtf/post/iam/</id><updated>2025-03-30T17:44:35+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/iam/" rel=""/><link href="https://mht.wtf/post/iam/index.html" rel="alternate"/><published>2025-03-30T17:44:35+02:00</published><content type="text/html">&lt;p&gt;After deploying &lt;a href=&quot;/post/ppl&quot;&gt;ppl&lt;/a&gt; a few weeks ago I&apos;ve been continuously adding small features.
Some of these turned into larger efforts and all of a sudden I had things to write about.
Here&apos;s a devlog-style post, without any expectation that there will be more coming.&lt;/p&gt;
&lt;h2&gt;Keyboard Shortcuts&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;/post/ppl&quot;&gt;ppl&lt;/a&gt; is built on &lt;a href=&quot;https://htmx.org&quot;&gt;htmx&lt;/a&gt;, which has features for &lt;a href=&quot;https://htmx.org/docs/#trigger-modifiers&quot;&gt;triggering&lt;/a&gt; events from the keyboard.
This works great for links or buttons, but I couldn&apos;t find a way of handling &lt;em&gt;focus&lt;/em&gt;.
I wanted &lt;code&gt;jk&lt;/code&gt; for navigating in the list of people, as well as &lt;code&gt;/&lt;/code&gt; for focusing the search field,
and so I had to built something myself.&lt;/p&gt;
&lt;p&gt;At first I inserted inline &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tags into the &lt;a href=&quot;https://maud.lambda.xyz/&quot;&gt;maud&lt;/a&gt; marco, like so:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;input #search type=&amp;quot;text&amp;quot; placeholder=&amp;quot;search&amp;quot; {} 
script { (PreEscaped(r#&amp;quot;
function keydown(evt) {
    // ...
}
document.getElementById(&amp;quot;search&amp;quot;).addEventListener(&apos;keydown&apos;, keydown);
&amp;quot;#))}

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This was fine for focusing the search field, but navigating the search results
got somewhat complicated, and a little too much JS to have inline as a &lt;code&gt;String&lt;/code&gt;.
In addition, I
had some state-related issues when htmx swapped in the &lt;code&gt;script&lt;/code&gt; tags, as well
as with my own hot-reloading system, in which I either ended up with no event
listener, or mutliple copies of the same listener.&lt;/p&gt;
&lt;p&gt;I pulled out the logic into its own file &lt;code&gt;shortcut.js&lt;/code&gt; and made the API htmx-like.
Now I annotate the HTML with &lt;code&gt;sx-&lt;/code&gt; attributes instead, and the 300 lines of JS in &lt;code&gt;shortcut.js&lt;/code&gt; handles the rest.&lt;/p&gt;
&lt;p&gt;Here&apos;s the markup for the search box:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;form #search {
    label for=&amp;quot;search-box&amp;quot; { &amp;quot;Search&amp;quot; }
    input #search-box type=&amp;quot;search&amp;quot; name=&amp;quot;search&amp;quot; placeholder=&amp;quot;/ to search&amp;quot;
        hx-trigger=&amp;quot;keyup changed delay:500ms&amp;quot;
        hx-target=&amp;quot;#ppl-list&amp;quot;
        hx-swap=&amp;quot;innerHTML&amp;quot;
        hx-post=&amp;quot;/search&amp;quot;
        sx-focus=&amp;quot;/&amp;quot;      // &amp;lt;--- here
        sx-blur=&amp;quot;Escape&amp;quot;  // &amp;lt;--- here
        autocomplete=&amp;quot;off&amp;quot;
        {}
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and here it is for the people list:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;html! {
    ul #ppl-list
        hx-boost=&amp;quot;true&amp;quot;
        sx-listnext=&amp;quot;j or j[ctrl]&amp;quot;
        sx-listprev=&amp;quot;k or k[ctrl]&amp;quot;
        sx-blur=&amp;quot;Escape&amp;quot; {
        @for persona in &amp;amp;personas {
            li {
                a href=(format!(&amp;quot;/persona/{}&amp;quot;, persona.id)) {
                    (persona.name)
                }
            }
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Annotating a node with &lt;code&gt;sx-focus&lt;/code&gt; registers a &lt;code&gt;keydown&lt;/code&gt; listener that assigns focus to that node when the key is pressed.
&lt;code&gt;sx-blur&lt;/code&gt; works similarly; if the current focus is in the sub-tree and you press the key, &lt;code&gt;.blur&lt;/code&gt; it.
Modifier syntax is &lt;code&gt;[ctrl,alt]&lt;/code&gt; (both &lt;code&gt;ctrl&lt;/code&gt; and &lt;code&gt;alt&lt;/code&gt;), and you can  separate key options with &lt;code&gt;&amp;quot; or &amp;quot;&lt;/code&gt;.
I made special events for lists because it was easier to handle the previous and next logic at that node level,
as well as the &amp;quot;no former focus and up means that we should select the last element&amp;quot; logic.&lt;/p&gt;
&lt;aside class=&quot;right&quot;&gt;
    There&apos;s also some coarse filtering to see if &lt;code&gt;document.activeElement&lt;/code&gt; is a text &lt;code&gt;input&lt;/code&gt; 
    to avoid stealing focus when you write &lt;code&gt;/&lt;/code&gt; there.
&lt;/aside&gt;
&lt;p&gt;In terms of readability, I don&apos;t think I can beat this.
In terms of hidden bugs and future potential maintainance cost, there&apos;s some.
For instance, I&apos;m not sure what happens if you register two nodes with the same shortcut.
Maybe they both fire? Probably.  So I don&apos;t do that.&lt;/p&gt;
&lt;h2&gt;Authentication&lt;/h2&gt;
&lt;p&gt;When I first wrote &lt;code&gt;ppl&lt;/code&gt; I used the &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/WWW-Authenticate#basic_authentication&quot;&gt;HTTP basic authentication scheme&lt;/a&gt;, which is really simple:
it sends username and password as plain text in a header.
This is probably not Secure with a capital S, but it&apos;s trivial to implement,
and it does block out bots and crawlers and the occational human stumbling
around. The good think about it is that your browser will prompt you with a
builtin login form without you having to do anything. Cool!  Great for v0.&lt;/p&gt;
&lt;p&gt;The main problem was that on Safari mobile I had to log in every time I visited the page.
I was hard-coding in credentials straight into my custom axum middleware, but
mainly, having to log in again was the catalyst for change.&lt;/p&gt;
&lt;p&gt;I&apos;m not very excited by auth, so I wanted to solve this once and hopefully not have to think about it again.
I decided to try to make a more general auth system, which I now call &lt;code&gt;iam&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;iam&lt;/code&gt; uses passkeys for auth using the &lt;a href=&quot;https://github.com/kanidm/webauthn-rs&quot;&gt;webauthn-rs&lt;/a&gt; crate to do basically all of the heavy lifting.
It is multi-user, and you can have multiple passkeys per user.
Now I have a URL I can redirect to for login, and after successfully authenticating I set a cookie for the whole domain
and redirect back.
There&apos;s also an endpoint for the JWKS used for signing the token so that I could easily verify that I didn&apos;t mess anything up.
This was &lt;em&gt;actually&lt;/em&gt; useful when developing, since I did end up accidentally creating tokens with a different keyset than
the one I tried to verify with.&lt;/p&gt;
&lt;aside class=&quot;right&quot;&gt;
    Multi-user in the sense that nothing stops me from creating more users,
    not in the sense that there are any other users.
&lt;/aside&gt;
&lt;p&gt;It also made it possible to use &lt;a href=&quot;https://jwt.io&quot;&gt;jwt.io&lt;/a&gt; to check that I didn&apos;t mess anything up.&lt;/p&gt;
&lt;p&gt;On the JS side I had to include some of &lt;a href=&quot;https://github.com/github/webauthn-json&quot;&gt;webauthn-json&lt;/a&gt; to deal transforming the &lt;code&gt;json&lt;/code&gt; payloads I got from webauth-rs to the APIs in
&lt;code&gt;navigator.credentials&lt;/code&gt;; apparently the spec says to accept &lt;code&gt;Uint8Buffer&lt;/code&gt;s, so one cannot simply pass &lt;code&gt;json&lt;/code&gt; through
from &lt;code&gt;fetch&lt;/code&gt; to the credentials APIs.
&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/PublicKeyCredential/parseCreationOptionsFromJSON_static#browser_compatibility&quot;&gt;There are APIs&lt;/a&gt; for doing the conversion, but they are not yet supported in Safari on iOS.
The very browser I was wrangling.
This was a source to much confusion as I accidentally double-base64-encoded strings and tried to authenticate
with &lt;code&gt;kid&lt;/code&gt;s that were either base64-encoded or -decoded one too many times.&lt;/p&gt;
&lt;h2&gt;Hot-reloading&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;ppl&lt;/code&gt; was the second axum-based web server I made that I wanted hot-reloading
of static assets (&lt;code&gt;css&lt;/code&gt; and any &lt;code&gt;js&lt;/code&gt; files) for, and &lt;code&gt;iam&lt;/code&gt; was the third.  Time
to finally split out my bespoke hot-reloading system into a crate to avoid
copying the same files over and over, and not backport any fixes that i do.&lt;/p&gt;
&lt;p&gt;It&apos;s very simple: use &lt;a href=&quot;https://github.com/notify-rs/notify&quot;&gt;notify&lt;/a&gt; to listen to all changes in a directory, send messages over websockets when they change,
and have the receipient swap out the &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;link&amp;gt;&lt;/code&gt; node in &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; and insert a new one with a dummy query parameter in the &lt;code&gt;href&lt;/code&gt;.
Finally, have a convention of having JS modules define a &lt;code&gt;__cleanup&lt;/code&gt;  function that is ran before it is removed, to remove any DOM state.
For CSS, if you set a timeout before removing the old node you avoid a white-flash while the new style sheet is being fetched.&lt;/p&gt;
&lt;p&gt;Usage code in &lt;code&gt;ppl&lt;/code&gt; is now pretty simple. First, conditionally insert &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; to handle the client-end of the WS connection (&lt;a href=&quot;https://github.com/lambda-fairy/maud/issues/446&quot;&gt;maud#446&lt;/a&gt;):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;head {
    // ...
    @if let Some(n) = hot_reload::script() { (n) }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;add the server side of the WS connection:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;    .nest_service(&amp;quot;/dev-hr&amp;quot;, hot_reload::router(&amp;quot;static&amp;quot;))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and strip it all away when compiling with &lt;code&gt;--release&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;#[cfg(not(debug_assertions))]
mod hot_reload {
    use axum::Router;
    use maud::Markup;
    pub fn router(_: &amp;amp;str) -&amp;gt; Router&amp;lt;()&amp;gt; {
        Router::new()
    }
    pub fn script() -&amp;gt; Option&amp;lt;Markup&amp;gt; {
        None
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I also tend to use &lt;a href=&quot;https://github.com/watchexec/cargo-watch&quot;&gt;cargo-watch&lt;/a&gt; (which apparently is on life support),
so to avoid triggering a recompile when an asset change I use &lt;code&gt;-i&lt;/code&gt; to ignore.
My &lt;a href=&quot;https://just.systems/man/en/&quot;&gt;justfile&lt;/a&gt; recipe looks like:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-just&quot;&gt;dev:
    cargo watch -i &apos;static/*&apos; -x &apos;run&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Operations&lt;/h2&gt;
&lt;p&gt;Two services is a crowd, and so now they both live in the same git repo, they are both built using Docker,
and I have a &lt;code&gt;docker-compose&lt;/code&gt; to deploy them both.
Getting this set up properly was a bit of work since &lt;code&gt;sqlx&lt;/code&gt; doesn&apos;t work all that well in a Cargo workspace, and especially with multiple different databases.
I&apos;ve ended up kinda making everything work with some combination of&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;SQLX_OFFLINE=true
SQLX_OFFLINE_DIR=$PWD/.sqlx
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;in &lt;code&gt;.env&lt;/code&gt;, opening each crate separately in &lt;code&gt;zed&lt;/code&gt;, and not changing anything while things still kinda work.&lt;/p&gt;
&lt;p&gt;Kinda, because at some point &lt;code&gt;rust-analyzer&lt;/code&gt; either couldn&apos;t find any tables in the database.
In addition, &lt;code&gt;tower-sessions-sqlx-store&lt;/code&gt;, which &lt;code&gt;iam&lt;/code&gt; uses, depends on &lt;code&gt;sqlx&lt;/code&gt; with the &lt;code&gt;time&lt;/code&gt; feature, so now
all queries that deals with dates and times default to &lt;code&gt;time&lt;/code&gt; types, whereas I&apos;m using &lt;code&gt;chrono&lt;/code&gt;.
This is &lt;a href=&quot;https://github.com/launchbadge/sqlx/issues/3412&quot;&gt;sqlx#3412&lt;/a&gt; and
&lt;a href=&quot;https://github.com/maxcountryman/tower-sessions-stores/issues/42&quot;&gt;tower-sessions-stores#42&lt;/a&gt;.
My workaround is to explicitly name types in queries:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;let q = sqlx::query!(
    r#&amp;quot;select name,
    birthdate as &amp;quot;birthdate: NaiveDate&amp;quot;,
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Ugly, but not too bad.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;That&apos;s it for now.
Next up, I&apos;d like to&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Set up automatic backup of all databases&lt;/li&gt;
&lt;li&gt;Create an RSS service so that I can get off &lt;a href=&quot;https://github.com/rss2email/rss2email&quot;&gt;r2e&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Create a frontend for my local-gym-scraper&lt;/li&gt;
&lt;li&gt;Figure out how to write nicer look handlers with reasonable error handling&lt;/li&gt;
&lt;li&gt;Pull out shared CSS to make creating new things that look okay even easier&lt;/li&gt;
&lt;/ul&gt;
&lt;aside class=&quot;right&quot;&gt;
    The gym I go to has a website with the number of people currently in the gym.
    I have a script that fetches this number every minute and stores it in a sqlite database.
    I currently have over half a million data points.
&lt;/aside&gt;
</content></entry><entry><title>Hello World</title><id>https://mht.wtf/post/hello-world/</id><updated>2016-01-25T00:23:46+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/hello-world/" rel=""/><link href="https://mht.wtf/post/hello-world/index.html" rel="alternate"/><published>2016-01-25T00:23:46+01:00</published><content type="text/html">&lt;p&gt;My last attempt at writing a blog didn&apos;t go so well. It turns out that finding the time to write a post that is even slightly interesting can be a challenge. But as we&apos;ve just entered 2016, I&apos;m trying once again.
This first post will be a super fast rundown of how this blog was made.&lt;/p&gt;
&lt;p&gt;I didn&apos;t want to spend too much time on creating this web page, as web really isn&apos;t my thing.
Therefore, I went with the static site generator &lt;a href=&quot;https://gohugo.io/&quot;&gt;Hugo&lt;/a&gt;, which is written in &lt;a href=&quot;http://golang.org&quot;&gt;golang&lt;/a&gt;.
The only reason I chose Hugo over, say Jekyll or Hexo, is that it was easiest to set up.
I tried to use Jekyll for a while, as my previous blog attempt used Jekyll, but after messing around with Ruby for 30 minutes trying to get something like &lt;code&gt;virtualenv&lt;/code&gt; to work, I gave up.&lt;/p&gt;
&lt;p&gt;The sites design is very simple, and somewhat inspired from &lt;a href=&quot;http://bettermotherfuckingwebsite.com/&quot;&gt;this&lt;/a&gt;, although my &lt;code&gt;css&lt;/code&gt; file ended up around 50 lines. It is also worth noting that there is no JavaScript here --- at least on the pages that don&apos;t have any math. (I did consider figuring out how to generate images from latex, but I think using &lt;code&gt;MathJax&lt;/code&gt; is a better alternative.)
Of course, in the year of &lt;a href=&quot;https://letsencrypt.org&quot;&gt;Let&apos;s Encrypt&lt;/a&gt;, the site is https only.&lt;/p&gt;
&lt;p&gt;Lastly, my posts gets from my laptop to the webserver by git. I have set up a remote repository on theserver, with a &lt;code&gt;post-receive&lt;/code&gt; hook, which runs &lt;code&gt;Hugo&lt;/code&gt;, and moves some files around.
The script is really short, and probably has faults.&lt;/p&gt;
&lt;p&gt;Thats about it. Nothing more, nothing less. Hopefully, I&apos;ll get around actually writing posts this time --- I&apos;m targeting at least one new post each month.&lt;/p&gt;
&lt;p&gt;mht&lt;/p&gt;
</content></entry><entry><title>Writing a JPEG decoder in Rust - Part 2: Implementation I</title><id>https://mht.wtf/post/jpeg-rust-2/</id><updated>2016-08-19T13:44:00+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/jpeg-rust-2/" rel=""/><link href="https://mht.wtf/post/jpeg-rust-2/index.html" rel="alternate"/><published>2016-08-19T13:44:00+02:00</published><content type="text/html">&lt;p&gt;&lt;em&gt;This is a blog series. Read part 1 &lt;a href=&quot;../jpeg-rust-1&quot;&gt;here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Last time we got a basic understanding of the different steps in decoding a JPEG image,
as well as how the file is structured.
What we did not get was any code or hints at implementation.
Finally we get to see an attempt of writing a JPEG decoder, in Rust.&lt;/p&gt;
&lt;p&gt;Additionally, I have now open sourced the project &lt;a href=&quot;https://github.com/martinhath/jpeg-rust&quot;&gt;on GitHub&lt;/a&gt;.
If you are interested in seeing the full thing, or want to know what I will try to cover
in Part 3 of this series, take a look!
Again, feedback is very welcome, be it typos, questions, or suggestions for improvements.
As I mentioned in Part 1, the project is very much ongoing, and I have cut quite a few corners
here and there.
If you are testing the decoder, and find an image that is not decoded properly,
I will be very interested in hearing from you&lt;sup&gt;&lt;a href=&quot;#user-content-fn-broken-sampling&quot; id=&quot;user-content-fnref-broken-sampling&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;!&lt;/p&gt;
&lt;p&gt;At last, this project is purely educational, for me and hopefully also for you.
I do not actually need a new JPEG decoder, and chances are you do not either :)&lt;/p&gt;
&lt;h1&gt;Implementation&lt;/h1&gt;
&lt;p&gt;Now that we have a high level understanding of how JPEG works, we can write a simple decoder for a test image of Lena&lt;sup&gt;&lt;a href=&quot;#user-content-fn-lena-dev&quot; id=&quot;user-content-fnref-lena-dev&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;lena.jpeg&quot; alt=&quot;lena&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Our Program&lt;/h2&gt;
&lt;p&gt;For testing and validation purposes, we will create a binary program, which we will run with&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cargo run &amp;lt;input.jpeg&amp;gt; &amp;lt;output.ppm&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The output file is a &lt;code&gt;ppm&lt;/code&gt; file, which was the simplest way I found to see the image data we decode.
The format is very simple: see &lt;a href=&quot;https://www.cs.swarthmore.edu/~soni/cs35/f13/Labs/extras/01/ppm_info.html&quot;&gt;here&lt;/a&gt; how it works.
Common image viewers &lt;em&gt;should&lt;/em&gt; open &lt;code&gt;ppm&lt;/code&gt; files. Personally I have used &lt;a href=&quot;https://wiki.gnome.org/Apps/EyeOfGnome&quot;&gt;eog&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;JFIF&lt;/h2&gt;
&lt;p&gt;What does the &lt;a href=&quot;https://www.w3.org/Graphics/JPEG/jfif3.pdf&quot;&gt;JFIF specification&lt;/a&gt; (pdf) enforce?
The file has to start with the &lt;code&gt;SOI&lt;/code&gt; (Start of Image) marker, followed by the &lt;code&gt;APP0&lt;/code&gt; (Application Segment 0) marker&lt;sup&gt;&lt;a href=&quot;#user-content-fn-jpeg-markers&quot; id=&quot;user-content-fnref-jpeg-markers&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.
According to the JPEG spec, there are 16 application markers, &lt;code&gt;0xffe0-0xffef&lt;/code&gt;, which are &amp;quot;reserved for application use&amp;quot;.
JFIF uses the segment to hold an identifier (the string &amp;quot;JFIF&amp;quot;), and thumbnail data, among a few other things.
The images I have tried to decode did not contain a thumbnail, so this is seemingly rare.&lt;/p&gt;
&lt;h2&gt;Reading the file&lt;/h2&gt;
&lt;p&gt;Initially we will simply read the whole image file.
There might be some memory and/or speed optimization potential using some kind of streaming approach,
but for now, let&apos;s stick with a &lt;code&gt;Vec&amp;lt;u8&amp;gt;&lt;/code&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-file-reading&quot; id=&quot;user-content-fnref-file-reading&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;fn file_to_bytes(path: &amp;amp;Path) -&amp;gt; Result&amp;lt;Vec&amp;lt;u8&amp;gt;, std::io::Error&amp;gt; {
    File::open(path).and_then(|mut file| {
        let mut bytes = Vec::new();
        try!(file.read_to_end(&amp;amp;mut bytes));
        Ok(bytes)
    })
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;The Marker Segment Loop&lt;/h2&gt;
&lt;p&gt;This will be the main loop of the decoder.
We keep track of where we are in the file, and read segment for segment.
Since most segments say exactly how long they are, this works out pretty well.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;let mut i = 0;
while i &amp;lt; vec.len() {
    if let Some(marker) = bytes_to_marker(&amp;amp;vec[i..]) {
        if marker == Marker::EndOfImage || marker == Marker::StartOfImage {
            // These markers doesn&apos;t have length bytes, so they must be
            // handled separately, in order to to avoid out-of-bounds indexes,
            // or reading nonsense lengths.
            i += 2;
            continue;
        }

        let data_length = (u8s_to_u16(&amp;amp;vec[i + 2..]) - 2) as usize;
        i += 4;

        match marker {
            Marker::Comment =&amp;gt; { /* Read comment data */ }
            Marker::QuantizationTable =&amp;gt; { /* Read table data */ }
            // Handle the rest of the markers
        }
        i += data_length;
    } else {
        panic!(&amp;quot;Unhandled byte marker: {:02x} {:02x}&amp;quot;, vec[i], vec[i + 1]);
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are quite a few different segments, but not all are very interesting.
The segments we will look at in this post are &lt;code&gt;Comment&lt;/code&gt;, &lt;code&gt;DefineHuffmanTable&lt;/code&gt;, &lt;code&gt;QuantizationTable&lt;/code&gt;, and &lt;code&gt;StartOfScan&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Reading a segment&lt;/h2&gt;
&lt;p&gt;As a simple example to get us started reading data in a segment, consider the &lt;code&gt;Comment&lt;/code&gt; marker,
which is of the following form&lt;sup&gt;&lt;a href=&quot;#user-content-fn-unreadable-form&quot; id=&quot;user-content-fnref-unreadable-form&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; 2 bytes  2 bytes   length-2 bytes
| marker | length | comment       |
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;marker&lt;/code&gt; bytes are already read, and so are the &lt;code&gt;length&lt;/code&gt; bytes, from which we also subtracted 2,
making &lt;code&gt;data_length&lt;/code&gt; the actual length of the data part of the segment (&lt;code&gt;comment&lt;/code&gt; in this case).
Reading the comment into a &lt;code&gt;String&lt;/code&gt; is pretty straight forward:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;Marker::Comment =&amp;gt; {
    let comment = str::from_utf8(&amp;amp;vec[i..i + data_length])
        .map(|s| s.to_string())
        .ok();
    image.comment = comment;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;&lt;code&gt;QuantizationTable&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;This segment contains one or more quantization tables (or matrices).
Each table is 65 bytes, where the first byte contains two fields: &lt;em&gt;Element Precision&lt;/em&gt; and &lt;em&gt;Table Destination&lt;/em&gt;, each 4 bits large.
Element precision specifies how large each value in the matrix is; &lt;code&gt;0&lt;/code&gt; for 8-bits, &lt;code&gt;1&lt;/code&gt; for 16-bits.
For baseline sequential DCT (which is the only mode we will support), this has to be &lt;code&gt;0&lt;/code&gt;.
Table destination specifies one of four possible &amp;quot;slots&amp;quot; for the matrix to be saved in;
that is, it is possible to have four quantization matrices and use different ones in different scans.&lt;/p&gt;
&lt;p&gt;The remaining 64 bytes are the values in the matrix.
If there are multiple tables in the segment they are located right after one another.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;Marker::QuantizationTable =&amp;gt; {
    // JPEG B.2.4.1
    let mut index = i;
    while index &amp;lt; i + data_length {
        let precision = (vec[index] &amp;amp; 0xf0) &amp;gt;&amp;gt; 4;
        assert!(precision == 0);
        let identifier = vec[index] &amp;amp; 0x0f;
        let table: Vec&amp;lt;u8&amp;gt; = vec[index + 1..index + 65]
            .iter()
            .cloned()
            .collect();

        image.quantization_tables[identifier as usize] = Some(table);
        // 64 entries + one header byte
        index += 65;
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;&lt;code&gt;DefineHuffmanTable&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Reading the Huffman tables are some work.
Each table consists of a &lt;em&gt;Table Class&lt;/em&gt; and a table destination (sharing one byte),
16 bytes specifying the number of codes of length 1 through 16, and a one byte value for each code.
The table class specifies wether the table is used for DC or AC coefficients, but we will get back to this in Part 3.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;Marker::DefineHuffmanTable =&amp;gt; {
    // JPEG B.2.4.2

    // Head of data for each table
    let mut huffman_index = i;
    // End of segment
    let segment_end = i + data_length;

    while huffman_index &amp;lt; segment_end {
        let table_class = (vec[huffman_index] &amp;amp; 0xf0) &amp;gt;&amp;gt; 4;
        let table_dest_id = vec[huffman_index] &amp;amp; 0x0f;
        huffman_index += 1;

        // There are `size_area[i]` number of codes of length `i + 1`.
        let size_area: &amp;amp;[u8] = &amp;amp;vec[huffman_index..huffman_index + 16];
        huffman_index += 16;

        // TODO: replace with `.sum` as of Rust 1.11
        let number_of_codes = size_area.iter()
            .fold(0, |a, b| a + (*b as usize));

        // Code `i` has value `data_area[i]`
        let data_area: &amp;amp;[u8] = &amp;amp;vec[huffman_index..huffman_index +
                                                   number_of_codes];
        huffman_index += number_of_codes;

        let huffman_table =
            huffman::HuffmanTable::from_size_data_tables(size_area, data_area);
        // DC = 0, AC = 1
        if table_class == 0 {
            image.huffman_dc_tables[table_dest_id as usize] =
                Some(huffman_table);
        } else {
            image.huffman_ac_tables[table_dest_id as usize] =
                Some(huffman_table);
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Processing the table data we read from the file is a little work: this happens in &lt;code&gt;huffman::HuffmanTable::from_size_data_tables&lt;/code&gt;,
to which we pass a &amp;quot;size table&amp;quot; (how many codes are of length &lt;code&gt;i&lt;/code&gt;?),
and a &amp;quot;data table&amp;quot; (which value is code &lt;code&gt;i&lt;/code&gt; mapped to?)&lt;sup&gt;&lt;a href=&quot;#user-content-fn-huffman-naming&quot; id=&quot;user-content-fnref-huffman-naming&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The Huffman module defines two structs:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;#[derive(Debug, Clone)]
pub struct HuffmanCode {
    /// How many bits are used in the code
    length: u8,
    /// The bit code. If the number of bits used to represent the code is less
    /// than `length`, prepend `len-length` `0`s in front.
    code: u16,
    /// The value the code is mapped to.
    value: u8,
}

#[derive(Debug)]
pub struct HuffmanTable {
    /// A list of all codes in the table, sorted on code length
    codes: Vec&amp;lt;HuffmanCode&amp;gt;,
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For simplicity, the actual table is just a &lt;code&gt;Vec&lt;/code&gt; of &lt;code&gt;HuffmanCode&lt;/code&gt;s&lt;sup&gt;&lt;a href=&quot;#user-content-fn-huffman-approach&quot; id=&quot;user-content-fnref-huffman-approach&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;;
we may see in a later post how to improve the performance here.&lt;/p&gt;
&lt;p&gt;Creating the table is done in two steps.
First we create a &lt;code&gt;Vec&lt;/code&gt; with the length of each code, such that code &lt;code&gt;i&lt;/code&gt; has length &lt;code&gt;code_lengths[i]&lt;/code&gt;;
remember that the &lt;code&gt;size_data&lt;/code&gt; read from the file is the number of codes of each length, and now
we want the length of each code.
Then we create a &lt;code&gt;Vec&lt;/code&gt; with the codes, by
merging the three parts: code length, code bit string, and code value.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;impl HuffmanTable {
    pub fn from_size_data_tables(size_data: &amp;amp;[u8], data_table: &amp;amp;[u8]) -&amp;gt; HuffmanTable {
        let code_lengths: Vec&amp;lt;u8&amp;gt; = (0..16)
            .flat_map(|i| repeat(i as u8 + 1).take(size_data[i] as usize))
            .collect();

        let code_table: Vec&amp;lt;u16&amp;gt; = HuffmanTable::make_code_table(&amp;amp;code_lengths);

        let codes: Vec&amp;lt;HuffmanCode&amp;gt; = data_table.iter()
            .zip(code_lengths.iter())
            .zip(code_table.iter())
            .map(|((&amp;amp;value, &amp;amp;length), &amp;amp;code)| {
                HuffmanCode {
                    length: length,
                    code: code,
                    value: value,
                }
            })
            .collect();

        HuffmanTable { codes: codes }
    }

    fn make_code_table(sizes: &amp;amp;[u8]) -&amp;gt; Vec&amp;lt;u16&amp;gt; {
        // This is more or less just an implementation of a
        // flowchart (Figure C.2) in the standard.
        let mut vec = Vec::new();
        let mut code: u16 = 0;
        let mut current_size = sizes[0];
        for &amp;amp;size in sizes {
            while size &amp;gt; current_size {
                code &amp;lt;&amp;lt;= 1;
                current_size += 1;
            }
            vec.push(code);
            if current_size &amp;gt; 16 || code == 0xffff {
                break;
            }
            code += 1;
        }
        vec
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Beware: &lt;code&gt;make_code_table&lt;/code&gt; was previously purely an implmementation of the flow chart mentioned in the comment, but this was &lt;a href=&quot;https://github.com/martinhath/jpeg-rust/blob/8077656fb26be6d5108cb715c76218e42882a36e/src/jpeg/huffman.rs#L90&quot;&gt;so ugly&lt;/a&gt; that I decided to implement it from scratch.
It has &lt;em&gt;not&lt;/em&gt; been thoroughly tested, but it &lt;em&gt;seems&lt;/em&gt; to work as intended.&lt;/p&gt;
&lt;p&gt;Now the table is read, processed, and put into its place.
Next up is actually decoding image data.&lt;/p&gt;
&lt;h2&gt;&lt;code&gt;StartOfScan&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;First we read in fields for the current scan.
This includes eg. the number of components, and which tables they use.
There is nothing fancy just yet; the data format is, as usual, listed in the JPEG specification.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;Marker::StartOfScan =&amp;gt; {
    // JPEG B.2.3
    let num_components = vec[i];
    let mut scan_components = Vec::new();
    for component in 0..num_components {
        scan_components.push(ScanComponentHeader {
            component_id: vec[i + 1],
            dc_table_selector: (vec[i + 2] &amp;amp; 0xf0) &amp;gt;&amp;gt; 4,
            ac_table_selector: vec[i + 2] &amp;amp; 0x0f,
        });
        i += 2;
    }

    let scan_header = ScanHeader {
        num_components: num_components,
        scan_components: scan_components,
        start_spectral_selection: vec[i + 1],
        end_spectral_selection: vec[i + 2],
        successive_approximation_bit_pos_high: (vec[i + 3] &amp;amp; 0xf0) &amp;gt;&amp;gt; 4,
        successive_approximation_bit_pos_low: vec[i + 3] &amp;amp; 0x0f,
    };
    // Register read data
    i += 4;

    if image.scan_headers.is_none() {
        image.scan_headers = Some(Vec::new());
    }
    image.scan_headers
        .as_mut()
        .map(|v| v.push(scan_header.clone()));
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As we have seen, markers are of the format &lt;code&gt;0xff__&lt;/code&gt;.
So what if the image data contains &lt;code&gt;0xff&lt;/code&gt;?
The solution is to encode &lt;code&gt;0xff&lt;/code&gt; as &lt;code&gt;0xff00&lt;/code&gt;, which is not a marker code.
Therefore we need to replace &lt;code&gt;ff00&lt;/code&gt; with &lt;code&gt;ff&lt;/code&gt; before sending it to the huffman decoder,
where we decode actual image data.
Additionally we need to keep track of how many bytes we skip, in order to increment &lt;code&gt;i&lt;/code&gt;
correctly when we are done with this scan.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;    // Copy data, and replace 0xff00 with 0xff.
    let mut bytes_skipped = 0;
    let mut encoded_data = Vec::new();
    {
        let mut i = i;
        while i &amp;lt; vec.len() {
            encoded_data.push(vec[i]);
            if vec[i] == 0xff &amp;amp;&amp;amp; vec[i + 1] == 0x00 {
                // Skip the 0x00 part here.
                i += 1;
                bytes_skipped += 1;
            }
            i += 1;
        }
    }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that we are copying the entire image data here.
Is this necessary? Not really. Is it slow? Uuh, maybe? It is simple? Hell yes.&lt;/p&gt;
&lt;p&gt;After this we pass in the huffman tables and quantization matrices which we hopefully have read in an earlier segment.
The code for this is nothing special, and somewhat ugly, so it has been omitted.
At last, we decode the image, and advance the index appropriately.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;    let (image_data, bytes_read) = jpeg_decoder.decode();
    image.image_data = Some(image_data);

    // Since we are calculating how much data there is in this segment,
    // we update `i` manually, and `continue` the `while` loop.
    i += bytes_read + bytes_skipped;
    continue;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that we assume the image contains only one scan.
As I mentioned in Part 1, all images I have tested containst just that,
so this is an assumption we will continue to make.
If we are afraid of forgetting this, we could always add an &lt;code&gt;assert!(image.image_data.is_none())&lt;/code&gt;
at the top of the &lt;code&gt;StartOfScan&lt;/code&gt; block.&lt;/p&gt;
&lt;p&gt;So what happens in &lt;code&gt;JpegDecoder::decode&lt;/code&gt;?
That will have to wait for the next part.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.reddit.com/r/programming/comments/4yinbq/writing_a_jpeg_decoder_in_rust_part_2/&quot;&gt;/r/programming thread&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.reddit.com/r/rust/comments/4yinbt/writing_a_jpeg_decoder_in_rust_part_2/&quot;&gt;/r/rust thread&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://news.ycombinator.com/item?id=12320755&quot;&gt;HN thread&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-broken-sampling&quot;&gt;
&lt;p&gt;I know images with certain sampling factors are messed up. If possible, check &lt;code&gt;identify -verbose &amp;lt;image.jpeg&amp;gt; | grep sampling&lt;/code&gt; (requires ImageMagick). Only &lt;code&gt;1x1&lt;/code&gt; and &lt;code&gt;2x1&lt;/code&gt; are supported. &lt;a href=&quot;#user-content-fnref-broken-sampling&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-lena-dev&quot;&gt;
&lt;p&gt;When developing, I did not use this image, because of its size (decoding this actually takes almost two seconds on my laptop!), and its complexity (multiple channels, present scaling factors, etc.).  Rather, I tested the decoder with a 16x8 grayscale version of the same image, and advanced to a full size grayscale image of Lena, when the small image showed correctly. &lt;a href=&quot;#user-content-fnref-lena-dev&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-jpeg-markers&quot;&gt;
&lt;p&gt;Note that while this is the JFIF spec, the markers used are defined in the JPEG spec. &lt;a href=&quot;#user-content-fnref-jpeg-markers&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-file-reading&quot;&gt;
&lt;p&gt;&lt;a href=&quot;https://www.reddit.com/user/DroidLogician&quot;&gt;/u/DroidLogician&lt;/a&gt; pointed out a &lt;em&gt;huge&lt;/em&gt; performance flaw in the original file reading code. Check &lt;a href=&quot;https://www.reddit.com/r/rust/comments/4yinbt/writing_a_jpeg_decoder_in_rust_part_2/d6obhd8&quot;&gt;the reddit thread&lt;/a&gt; out! &lt;a href=&quot;#user-content-fnref-file-reading&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-unreadable-form&quot;&gt;
&lt;p&gt;If you think my ASCII skills are nonexistent, check page 47 (marked 43) in the JPEG spec. &lt;a href=&quot;#user-content-fnref-unreadable-form&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-huffman-naming&quot;&gt;
&lt;p&gt;The naming here is the same as what is used in the spec. I think there are room for improvements, but I have yet to come up with better names. &lt;a href=&quot;#user-content-fnref-huffman-naming&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-huffman-approach&quot;&gt;
&lt;p&gt;By now, you might see how I plan to use the Huffman table to decode data. In retrospect, constructing an actual tree and traversing it with a bit stream iterator of sorts would perhaps have been a better idea, although it would have been more work up front. &lt;a href=&quot;#user-content-fnref-huffman-approach&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>A Ghost Story</title><id>https://mht.wtf/post/ghost-stories/</id><updated>2019-10-15T21:29:19+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/ghost-stories/" rel=""/><link href="https://mht.wtf/post/ghost-stories/index.html" rel="alternate"/><published>2019-10-15T21:29:19+02:00</published><content type="text/html">&lt;p&gt;Spooktober is upon us and it&apos;s been a while since I&apos;ve written anything here,
so here&apos;s a short computing ghost story that I experienced a few years back.&lt;/p&gt;
&lt;p&gt;I was living in Zürich at the time with a bunch of people from all over
the world, including &amp;quot;Bee&amp;quot; from south korea,
and as many other ghost stories, we were enjoying ourselves with beer.
I cannot remember exactly what we were talking about, but my phone,
at the time a Samsung, was lying on top of Bees phone, which brand or model I do not remember.&lt;/p&gt;
&lt;p&gt;For some reason, I ended up checking &lt;em&gt;that maps app&lt;/em&gt; but to my surprise, the GPS didn&apos;t indicate
that I was drinking beer in Zürich, but that I was in Busan. Busan, South Korea.&lt;/p&gt;
&lt;p&gt;Humoured by this, I give my phone to Bee and ask &lt;em&gt;&amp;quot;Hey Bee, do you know where this is?&amp;quot;&lt;/em&gt;.
He laughs, and looks at me in a funny way: &lt;em&gt;&amp;quot;Yes, this is my parents house?&amp;quot;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;doot.jpg&quot; alt=&quot;Spooky skeletal doot&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I want to clarify, my phone didn&apos;t magically open up &lt;em&gt;that maps app&lt;/em&gt; on the position of Bees parent&apos;s
home addresss in Busan. That would have been strange, but I could probably explain it.
Maybe there was some NFC thing going on, or maybe he (or even I) had picked up my phone
and, for some reason, found his parents house.
No.
My phone decided that &lt;em&gt;it was in Busan&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;I have absolutely no explanation for what happened here, apart from a visit by the Android ghosts.
There is absolutely no reason whatsoever that I can come up with for my phones GPS position
to be changed to, not anything arbitrary, but to the house of the &lt;em&gt;parents&lt;/em&gt; of another person in the room.&lt;/p&gt;
&lt;p&gt;It is unclear whether Bees phone had saved that address as his home address, but I simply cannot
believe that it hasn&apos;t. Thus, this is really all that I have for unraveling this mystery.&lt;/p&gt;
&lt;h3&gt;TL;DR&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Phone &lt;code&gt;A&lt;/code&gt; is on top of phone &lt;code&gt;B&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Phone &lt;code&gt;B&lt;/code&gt; has saved address.&lt;/li&gt;
&lt;li&gt;???&lt;/li&gt;
&lt;li&gt;Phone &lt;code&gt;A&lt;/code&gt;s GPS thinks its position is that address.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Please fill in step 3.&lt;/p&gt;
&lt;p&gt;If anyone has suggestions or ideas for what happened, please do send a mail to &lt;a href=&quot;mailto:https://lists.sr.ht/~mht/public-inbox&quot;&gt;my public inbox&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;https://lists.sr.ht/~mht/public-inbox
&lt;/code&gt;&lt;/pre&gt;
&lt;hr /&gt;
</content></entry><entry><title>Building Zig structs at Compile Time</title><id>https://mht.wtf/post/comptime-struct/</id><updated>2022-06-11T22:05:08+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/comptime-struct/" rel=""/><link href="https://mht.wtf/post/comptime-struct/index.html" rel="alternate"/><published>2022-06-11T22:05:08+02:00</published><content type="text/html">&lt;p&gt;Let&apos;s talk about &lt;code&gt;comptime&lt;/code&gt; in &lt;a href=&quot;https://ziglang.org&quot;&gt;Zig&lt;/a&gt;. &lt;code&gt;comptime&lt;/code&gt; is the feature that allow
you to run code at compile time, and is maybe Zig&apos;s biggest differentiator
from other languages in the same space. Combined with having types as values
we get both type specializaton, generics, reflection, and even code generation.&lt;/p&gt;
&lt;p&gt;For readers who are not familiar with Zig, here&apos;s a small example.  We can
make a &lt;code&gt;Range&lt;/code&gt; type that is generic over the element type by writing
a function called &lt;code&gt;Range&lt;/code&gt; that takes a type (which is required to be
compile time known), and produces a &lt;code&gt;struct&lt;/code&gt; with two fields of that type.
Returning the &lt;code&gt;struct&lt;/code&gt; from the function is no problem; types are values
after all. It looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-zig&quot;&gt;fn Range(comptime t: type) type {
    return struct {
        from: t,
        to: t,
    };
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can use this new function, and the type it returns, like this&lt;sup&gt;&lt;a href=&quot;#user-content-fn-typeexpr&quot; id=&quot;user-content-fnref-typeexpr&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-zig&quot;&gt;test &amp;quot;range-create&amp;quot; {
    var a = Range(i32){ .from = 0, .to = 10 };
    std.debug.print(&amp;quot;\n[{}, {})\n&amp;quot;, .{ a.from, a.to });
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which, when ran, prints out the numbers we gave in a math-like format.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ zig test comptime-struct/cs.zig --test-filter range
Test [0/1] test &amp;quot;range-create&amp;quot;...
[0, 10)
All 1 tests passed.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can also add a method to the newly created type, for instance for checking
whether a value is in the range or not. The type of this parameter &lt;code&gt;other&lt;/code&gt;
in the &lt;code&gt;contains&lt;/code&gt; method is the type &lt;code&gt;t&lt;/code&gt; that we&apos;re given as argument in
&lt;code&gt;Range&lt;/code&gt;, and it works just as expected.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-zig&quot;&gt;fn Range(comptime t: type) type {
    return struct {
        from: t,
        to: t,

        pub fn contains(this: @This(), other: t) bool {
            return this.from &amp;lt;= other and other &amp;lt; this.to;
        }
    };
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here we&apos;re using the &lt;code&gt;@This()&lt;/code&gt; builtin which gives us the type in which we
currently are. We need this here since we don&apos;t have a name for the type yet,
as we&apos;re still defining it. There&apos;s nothing special about the name &lt;code&gt;this&lt;/code&gt;,
but it is familiar from many other languages, and since the builtin is called
&lt;code&gt;@This&lt;/code&gt; it&apos;s a convenient name to give. The new method can be tested like so:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-zig&quot;&gt;test &amp;quot;range-contains&amp;quot; {
    var r = Range(i32){ .from = 0, .to = 10 };
    try std.testing.expect(r.contains(5));
    try std.testing.expect(r.contains(0));
    try std.testing.expect(r.contains(9));
    try std.testing.expect(!r.contains(10));
    try std.testing.expect(!r.contains(-1));
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which works:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ zig test comptime-struct/cs.zig --test-filter range-contains
All 1 tests passed.
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Building Structs&lt;/h2&gt;
&lt;p&gt;Usually in Zig, the way you define a &lt;code&gt;struct&lt;/code&gt; is by assigning the value of a
&lt;code&gt;struct { .. }&lt;/code&gt; expression to a name, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-zig&quot;&gt;const MyString = struct {
    someNumber: i32,
    aBool: bool,
    yourString: []const u8,
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We have just seen how to control the &lt;em&gt;types&lt;/em&gt; of the struct fields
programatically (and, I stress, with completely regular Zig code!).
What about the &lt;em&gt;names&lt;/em&gt;? Or both? It is possible to construct, at compile time, a new
&lt;code&gt;struct&lt;/code&gt; in which the names and types of all of the field come from some other data?&lt;/p&gt;
&lt;p&gt;The answer is yes! The key is the &lt;code&gt;@Type&lt;/code&gt; builtin, which takes a
&lt;code&gt;std.builtin.TypeInfo&lt;/code&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-typeinfo&quot; id=&quot;user-content-fnref-typeinfo&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; and &lt;em&gt;reifies&lt;/em&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-reify&quot; id=&quot;user-content-fnref-reify&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; the description of the
type into a real &lt;code&gt;type&lt;/code&gt;. Here&apos;s how it looks:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-zig&quot;&gt;test &amp;quot;reify-empty&amp;quot; {
    const Type = @Type(.{
        .Struct = .{
            .layout = .Auto,
            .fields = &amp;amp;[_]std.builtin.TypeInfo.StructField{},
            .decls = &amp;amp;[_]std.builtin.TypeInfo.Declaration{},
            .is_tuple = false,
        },
    });
    try std.testing.expect(@sizeOf(Type) == 0);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will create an empty &lt;code&gt;struct&lt;/code&gt;, since we&apos;re instantiating the &lt;code&gt;.Struct&lt;/code&gt;
field of the &lt;code&gt;TypeInfo&lt;/code&gt; &lt;code&gt;enum&lt;/code&gt; with both &lt;code&gt;.fields&lt;/code&gt; and &lt;code&gt;.decls&lt;/code&gt; empty. So
far this only seems to be a difficult way of writing &lt;code&gt;const Type = struct {};&lt;/code&gt;, but this is just regular Zig code, and while we require that the
value passed to &lt;code&gt;@Type&lt;/code&gt; is compile time known, we don&apos;t require it to
be one big literal like it is now. It can very well be the result of a
complex computation, as long as it is compile time known.&lt;/p&gt;
&lt;p&gt;We can for instance write a function that takes an anonymous struct literal
with names and types that should be the fields of a &lt;code&gt;struct&lt;/code&gt;, and if the
name starts with a &lt;code&gt;?&lt;/code&gt; it automatially makes the field optional.  In code,
calling our function&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-zig&quot;&gt;const Foo = MakeStruct(.{
    .{ &amp;quot;someNumber&amp;quot;, i32 },
    .{ &amp;quot;?aBool&amp;quot;, bool },
    .{ &amp;quot;?yourString&amp;quot;, yourString },
});
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;should be the same writing&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-zig&quot;&gt;const Foo = struct {
    someNumber: i32,
    aBool ?bool,
    yourString: ?[]const u8,
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One way of doing this by building up a list of &lt;code&gt;StructField&lt;/code&gt;s with the
right names and types, making a &lt;code&gt;TypeInfo&lt;/code&gt; struct with those fields, and
pass it to &lt;code&gt;@Type&lt;/code&gt;.  The only thing we must do is to branch on whether the
variable name starts with a &lt;code&gt;?&lt;/code&gt;, and if so, remove the &lt;code&gt;?&lt;/code&gt; from the name and
turn the given type into an optional type, &lt;code&gt;T&lt;/code&gt; to &lt;code&gt;?T&lt;/code&gt;.  Here is an example:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-zig&quot;&gt;fn MakeStruct(comptime in: anytype) type {
    var fields: [in.len]std.builtin.TypeInfo.StructField = undefined;
    for (in) |t, i| {
        var fieldType: type = t[1];
        var fieldName: []const u8 = t[0][0..];
        if (fieldName[0] == &apos;?&apos;) {
            fieldType = @Type(.{ .Optional = .{ .child = fieldType } });
            fieldName = fieldName[1..];
        }
        fields[i] = .{
            .name = fieldName,
            .field_type = fieldType,
            .default_value = null,
            .is_comptime = false,
            .alignment = 0,
        };
    }
    return @Type(.{
        .Struct = .{
            .layout = .Auto,
            .fields = fields[0..],
            .decls = &amp;amp;[_]std.builtin.TypeInfo.Declaration{},
            .is_tuple = false,
        },
    });
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There&apos;s another thing to highlight here. We are declaring &lt;code&gt;fields&lt;/code&gt; to
be an array of length &lt;code&gt;in.len&lt;/code&gt;, even though &lt;code&gt;in&lt;/code&gt; is the argument of the
function. This is fine since &lt;code&gt;in&lt;/code&gt; is declared to be &lt;code&gt;comptime&lt;/code&gt; known, and
so of course we should be able to declare statically sized arrays of that
length, and indeed, in Zig we can.&lt;/p&gt;
&lt;p&gt;We can see that we&apos;re getting what we expect by using the &amp;quot;inverse&amp;quot; builtin
of &lt;code&gt;@Type&lt;/code&gt; which is &lt;code&gt;@typeInfo&lt;/code&gt;. &lt;code&gt;@typeInfo&lt;/code&gt; takes a &lt;code&gt;type&lt;/code&gt; and returns its
&lt;code&gt;std.builtin.TypeInfo&lt;/code&gt;, which we can operate on.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-zig&quot;&gt;test &amp;quot;make-struct&amp;quot; {
    const Type = MakeStruct(.{
        .{ &amp;quot;someNumber&amp;quot;, i32 },
        .{ &amp;quot;?aBool&amp;quot;, bool },
        .{ &amp;quot;?yourString&amp;quot;, []const u8 }, 
    });
    
    std.debug.print(&amp;quot;\n&amp;quot;, .{});
    inline for (@typeInfo(Type).Struct.fields) |f, i| {
        std.debug.print(&amp;quot;field {} is {s} type is {s}\n&amp;quot;, .{ i, f.name, f.field_type });
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here we are just looping over the fields of the struct and printing out
the names and types in order.  The result is this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ zig test comptime-struct/cs.zig --test-filter make
Test [0/1] test &amp;quot;make-struct&amp;quot;...
field 0 is someNumber type is i32
field 1 is aBool type is ?bool
field 2 is yourString type is ?[]const u8
All 1 tests passed.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We have succesfully moved the &lt;code&gt;?&lt;/code&gt; from the field names and over to the
field types.  Granted, this new way of making &lt;code&gt;struct&lt;/code&gt;s does not offer
very much in terms of readability or functionality. Putting the &lt;code&gt;?&lt;/code&gt; in
the name isn&apos;t any easier than having it in the type.&lt;/p&gt;
&lt;h2&gt;So What?&lt;/h2&gt;
&lt;p&gt;Even though we are effectively generating code at compile time, there&apos;s
no magic here: we&apos;re just writing regular Zig code. The data types we&apos;re
making are from &lt;code&gt;std.builtin&lt;/code&gt;, so they&apos;re tightly bound to the language,
but there&apos;s no special syntax, and no second language to learn and remember.
By simply filling in a &lt;code&gt;std.builtin.TypeInfo&lt;/code&gt; we can construct new types
at compile time.&lt;/p&gt;
&lt;p&gt;Also, the input to our function was an anonymous struct literal, but
this doesn&apos;t have to be the case.  We could have taken a &lt;code&gt;[]const u8&lt;/code&gt;
with source code of a &lt;code&gt;struct&lt;/code&gt; definition from another language like C++
or Rust, parsed it, and constructed the corresponding Zig type for the
given definition. Parsing the other language would be the vast majority of
the work, because as we&apos;ve just seen, making the Zig &lt;code&gt;struct&lt;/code&gt; is really easy.&lt;/p&gt;
&lt;p&gt;Another idea is to have a compile-time readable configuration &lt;code&gt;.ini&lt;/code&gt; file
embedded in the source with &lt;code&gt;@embedFile&lt;/code&gt;, and a function that reads in the
file, finds the names and types of the values in the file, and collects it
all into a &lt;code&gt;struct&lt;/code&gt;.  This struct would always be in perfect correspondance
with the &lt;code&gt;.ini&lt;/code&gt; file, and so there is no danger of reference configuration
file and code diverging.  There would be one definite source of truth for
the configuration values.&lt;/p&gt;
&lt;p&gt;In most other compiled languages, this is very difficult to do without
any external tools.  One would most likely try to go the other way, and
have the &lt;code&gt;struct&lt;/code&gt; definition be the single source of truth, and output
the default config file from that, either through a function that has to
be kept up-to-date as fields are added and changed, or by a macro system,
which is likely to be written in some DSL.  If you would want to have the
configuration file as a plain text file, you would need to ensure that the
file on disk is always consistent with the code; maybe you would want this
to be a distinct step in the build process of the program.&lt;/p&gt;
&lt;p&gt;Either way, the value proposition of Zig is clear: by simply allowing Zig
code to be ran at compile time&lt;sup&gt;&lt;a href=&quot;#user-content-fn-lim&quot; id=&quot;user-content-fnref-lim&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; we get a powerful and easy to use
metaprogramming system without requiring to learn a second language or
use external tools.&lt;/p&gt;
&lt;p&gt;Pointers, complaints, suggestions, and others can be sent to &lt;a href=&quot;mailto:https://lists.sr.ht/~mht/public-inbox&quot;&gt;my public inbox&lt;/a&gt; (plain text emails only).&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-typeexpr&quot;&gt;
&lt;p&gt;We could also have written &lt;code&gt;var a: Range(i32) = .{ .from = 0, .to = 10 };&lt;/code&gt; even though it might look funny that we have put a function call in the type specifier position of the expression, as this is usually reserved for type literals in other languages. Not so in Zig! &lt;a href=&quot;#user-content-fnref-typeexpr&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-typeinfo&quot;&gt;
&lt;p&gt;This type is about to be renamed to just &lt;code&gt;Type&lt;/code&gt;, but I&apos;m running my code samples with the 0.9.1 compiler which is still using the old name. &lt;a href=&quot;#user-content-fnref-typeinfo&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-reify&quot;&gt;
&lt;p&gt;From &lt;a href=&quot;https://www.merriam-webster.com/dictionary/reify&quot;&gt;Merriam-Webster&lt;/a&gt;: &lt;em&gt;&amp;quot;Reify: to consider or represent (something abstract) as a material or concrete thing : to give definite content and form to (a concept or idea)&amp;quot;&lt;/em&gt; &lt;a href=&quot;#user-content-fnref-reify&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-lim&quot;&gt;
&lt;p&gt;At &lt;code&gt;comptime&lt;/code&gt;, the full Zig language is available, but there are some limitations. For instance, I/O is not allowed. &lt;a href=&quot;#user-content-fnref-lim&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Stable Timings</title><id>https://mht.wtf/post/timing/</id><updated>2018-02-05T10:27:54+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/timing/" rel=""/><link href="https://mht.wtf/post/timing/index.html" rel="alternate"/><published>2018-02-05T10:27:54+01:00</published><content type="text/html">&lt;p&gt;It was friday afternoon and I was working on a proof-of-concept garbage
collector for Rust. The general idea of the collector is to have threads
register their &lt;em&gt;roots&lt;/em&gt; --- pointers to memory that the collector cares about ---
and every once in a while a thread collects these roots, fork off a new process,
find memory that is no longer reachable from any root, and return these pointers
back to the parent process using &lt;code&gt;mmap&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The initial implementation was obviously not well tuned for performance.
In order to get a rough picture of where my system spent its cycles, I
inserted calls to &lt;code&gt;time::precise_time_ns&lt;/code&gt; before and after certain blocks, and
wrote out the difference at the end. The reported timings looked like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;consolidate wait for signals 0ms
               collect roots 0ms
                        fork 0ms
              wait for child 67.108864ms
                   read ptrs 0ms
                   call free 201.3266ms

consolidate wait for signals 0ms
               collect roots 0ms
                        fork 0ms
              wait for child 67.108864ms
                   read ptrs 0ms
                   call free 134.21773ms

consolidate wait for signals 0ms
               collect roots 0ms
                        fork 0ms
              wait for child 67.108864ms
                   read ptrs 0ms
                   call free 134.21773ms
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Strange. Don&apos;t mind the fact that several of the timings are 0 --- zero --- ms,
but &lt;code&gt;wait for child&lt;/code&gt; is the &lt;em&gt;exact&lt;/em&gt; same number across the three iterations
shown. And &lt;code&gt;call free&lt;/code&gt; is the &lt;em&gt;exact&lt;/em&gt; same the last two iterations.  And what
about the fact that &lt;code&gt;67.1 + 134.2 = 201.3&lt;/code&gt;?? Something is obviously wrong here.
Maybe there is some weird OS synchronization going on with &lt;code&gt;fork&lt;/code&gt; and/or
&lt;code&gt;mmap&lt;/code&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-not-my-fault&quot; id=&quot;user-content-fnref-not-my-fault&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;? Or maybe the numbers just are very stable since I&apos;m using
a test program that does the exact same thing a billion times? (This could be
the case, as the number of pointers returned were in fact the &lt;em&gt;exact&lt;/em&gt; same
number on all iterations)&lt;/p&gt;
&lt;p&gt;It was friday afternoon so I commited and went home.&lt;/p&gt;
&lt;p&gt;The day after, I found myself programming on a side project, inserting the very
same timing calls. Again, I got back very strange numbers. It was then I
realized what was happening.&lt;/p&gt;
&lt;p&gt;My code looked roughly like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;let t0 = time::precise_time_ns();
do_some_work();
let t1 = time::precise_time_ns();
....

println!(&amp;quot;do_some_work: {}ns&amp;quot;, (t1 as f32 - t0 as f32) / 1_000_000.0);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;time::precise_time_ns&lt;/code&gt; returns a &lt;code&gt;u64&lt;/code&gt;. As I&apos;m writing this, it returns
&lt;code&gt;659_950_597_875_582&lt;/code&gt;. That&apos;s a &lt;em&gt;large&lt;/em&gt; number. That number is way larger than the
largest number a &lt;code&gt;f32&lt;/code&gt; can properly represent. By just casting back and forth,
we clearly see this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;extern crate time;
fn main() {
    let t = time::precise_time_ns();
    println!(&amp;quot;{}&amp;quot;, t);                // prints 660_149_119_845_010
    println!(&amp;quot;{}&amp;quot;, t as f32 as u64);  // prints 660_149_089_861_632
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So since I casted to &lt;code&gt;f32&lt;/code&gt; &lt;em&gt;before&lt;/em&gt; doing the subtraction, time differences in
the order of &lt;code&gt;29_983_378ns&lt;/code&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-bound&quot; id=&quot;user-content-fnref-bound&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, &lt;code&gt;29ms&lt;/code&gt;, would simply disappear, causing the
reported difference to be zero. And the larger differences would be truncated to
the highest multiple of ~&lt;code&gt;33&lt;/code&gt;. This explains it all!&lt;/p&gt;
&lt;p&gt;Now, why did I cast before subtracting? I don&apos;t know. Why did I use &lt;code&gt;f32&lt;/code&gt;
instead of &lt;code&gt;f64&lt;/code&gt; (which in this case would be sufficient)? I don&apos;t know. It was
just one of those times when you write quick and dirty code without really
thinking about the details of what you&apos;re doing. Luckily, this time the bug was
fairly easy to spot.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-not-my-fault&quot;&gt;
&lt;p&gt;I mean, it could &lt;em&gt;obviously&lt;/em&gt; not be anything wrong with &lt;em&gt;my&lt;/em&gt; code! &lt;a href=&quot;#user-content-fnref-not-my-fault&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-bound&quot;&gt;
&lt;p&gt;Finding the exact bound is left as an exercise to the reader. &lt;a href=&quot;#user-content-fnref-bound&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Code Generation and Merge Sort</title><id>https://mht.wtf/post/merge/</id><updated>2019-04-24T10:29:54+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/merge/" rel=""/><link href="https://mht.wtf/post/merge/index.html" rel="alternate"/><published>2019-04-24T10:29:54+02:00</published><content type="text/html">&lt;p&gt;I was reading a few pages of Knuths &lt;em&gt;The Art of Computer Programming&lt;/em&gt;, Volume
4A about &amp;quot;branchless computation&amp;quot; (p. 180) in which he demonstrates how to get
rid of branches by using conditional instructions. As an instructive
example he consideres the inner part of &lt;em&gt;merge sort&lt;/em&gt;, in which we are to merge
two sorted lists of numbers into one bigger list of the numbers. The description
as given by Knuth is as follows:&lt;/p&gt;
&lt;p&gt;If $x_i &amp;lt; y_j$ set $z_k \gets x_i$, $i \gets i+1$, and go to &lt;em&gt;x_done&lt;/em&gt; if $i = i_{max}$.&lt;br /&gt;
Otherwise set $z_k \gets y_i$, $j \gets j+1$, and go to &lt;em&gt;y_done&lt;/em&gt; if $j = j_{max}$.&lt;br /&gt;
Then set $k \gets k+1$ and go to &lt;em&gt;z_done&lt;/em&gt; if $k = k_{max}$.&lt;/p&gt;
&lt;p&gt;$x$ and $y$ are the input lists, $z$ is the output merged list. $i$, $j$, and
$k$ are loop indices for the three respective lists and the $_{max}$ variants
are the lists length.&lt;/p&gt;
&lt;p&gt;I got curious and decided to see how a standard optimizing compilier would
handle this case, and whether writing the assmebly yourself would provide any
gain in performance. After all, this is just slightly more complicated than the
trivial examples used to show off good codegen, so it would not be unreasonable
for the compiler to manage to fix a bad implementation of this. In addition, it
would serve as a great excuse to finally learn how to write &lt;code&gt;x86&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Basics&lt;/h2&gt;
&lt;p&gt;Here&apos;s the inner loop in C code:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;void branching(uint64_t *xs, size_t xmax, uint64_t *ys, size_t ymax, 
               uint64_t *zs, size_t zmax) {
  size_t i = 0, j = 0, k = 0;
  while (k &amp;lt; zmax) {
    if (xs[i] &amp;lt; ys[j]) {
      zs[k++] = xs[i++];
      if (i == xmax) { // x_done
        memcpy(zs + k, ys + j, 8 * (zmax - k));
        return; 
      }
    } else {
      zs[k++] = ys[j++];
      if (j == ymax) { // y_done
        memcpy(zs + k, xs + i, 8 * (zmax - k));
        return; 
      }
    }
  } // z_done
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This seems to be a more or less straight forward textbook implementation of the
procedure, so it will do fine as a benchmark. As a quick check before going any
deeper into this we can use &lt;a href=&quot;https://godbolt.org/&quot;&gt;godbolt.org&lt;/a&gt; to see whether
this experiment is even worth doing. Godbolts &lt;code&gt;x86-64 gcc 8.3&lt;/code&gt; with &lt;code&gt;-O3&lt;/code&gt; spits
out this (annotations are by me):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;branching(unsigned long*, unsigned long, unsigned long*, unsigned long, 
          unsigned long*, unsigned long):
        test    r9, r9       ; if (r9 == 0)
        je      .L15         ;   goto .L15
        push    r13          ;
        xor     eax, eax     ;
        xor     r11d, r11d   ; j = 0
        xor     r10d, r10d   ; i = 0
        push    r12          ;
        push    rbp          ;
        push    rbx          ;
        jmp     .L2          ;
.L17:
        add     r10, 1                        ; i++
        mov     QWORD PTR [r8-8+rax*8], rbp   ; zs[k-1] = xi
        cmp     r10, rsi                      ; if (i == xmax)
        je      .L16                          ;   goto .L16
.L6:
        cmp     r9, rax      ; if (k == zmax)
        je      .L1          ;   goto .L1
.L2:
        lea     r12, [rdi+r10*8]             ; calculate xs + i
        lea     r13, [rdx+r11*8]             ; calculate ys + j
        add     rax, 1                       ; k++
        mov     rbp, QWORD PTR [r12]         ; xi = xs[i]
        mov     rbx, QWORD PTR [r13+0]       ; yj = ys[j]
        cmp     rbp, rbx                     ; if (xi &amp;lt; yj)
        jb      .L17                         ;   goto .L17
        add     r11, 1                       ; j++
        mov     QWORD PTR [r8-8+rax*8], rbx  ; zs[k-1] = yj
        cmp     r11, rcx                     ; if (j != ymax)
        jne     .L6                          ;   goto .L6
        sub     r9, rax            ; y_done 
        pop     rbx                ;
        mov     rsi, r12           ;
        pop     rbp                ;
        lea     rdi, [r8+rax*8]    ;
        pop     r12                ;
        lea     rdx, [0+r9*8]      ;
        pop     r13                ;
        jmp     memcpy             ;
.L1:
        pop     rbx       ; z_done
        pop     rbp       ;
        pop     r12       ;
        pop     r13       ; 
        ret               ;
.L16:
        sub     r9, rax            ; x_done
        pop     rbx                ;
        mov     rsi, r13           ;
        pop     rbp                ;
        lea     rdi, [r8+rax*8]    ;
        pop     r12                ;
        lea     rdx, [0+r9*8]      ;
        pop     r13                ;
        jmp     memcpy             ;
.L15:
        ret
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Plenty of branches!&lt;sup&gt;&lt;a href=&quot;#user-content-fn-kinc&quot; id=&quot;user-content-fnref-kinc&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Now, maybe it turns out that it doesn&apos;t matter if we&apos;re branching or not and
that the compiler knows best. We could guess that the reason we&apos;re still
getting branches is because that&apos;s really the best way to go here. After all
&amp;quot;you can&apos;t beat the compiler&amp;quot; seems to be the consensus in &lt;em&gt;many&lt;/em&gt; programming
circles. Let&apos;s try to write a version in C without exessive use of branching.
Then perhaps the compiler will generate different code, and we can see what
that difference amounts to in terms of running time. We can adopt Knuth&apos;s
branchless version:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;void nonbranching_but_branching(uint64_t *xs, size_t xmax, uint64_t *ys, size_t ymax, 
                                uint64_t *zs, size_t zmax) {
  size_t i = 0, j = 0, k = 0;
  uint64_t xi = xs[i], yj = ys[j];
  while ((i &amp;lt; xmax) &amp;amp;&amp;amp; (j &amp;lt; ymax) &amp;amp;&amp;amp; (k &amp;lt; zmax)) {
    int64_t t = one_if_lt(xi - yj);
    yj = min(xi, yj);
    zs[k] = yj;
    i += t;
    xi = xs[i];
    t ^= 1;
    j += t;
    yj = ys[j];
    k += 1;
  }
  if (i == xmax)
    memcpy(zs + k, ys + j, 8 * (zmax - k));
  if (j == ymax)
    memcpy(zs + k, xs + i, 8 * (zmax - k));
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What is going on, you might ask? The general idea is to first get &lt;code&gt;min(xi, yj)&lt;/code&gt;, and then have a number &lt;code&gt;t&lt;/code&gt; that&apos;s &lt;code&gt;1&lt;/code&gt; if &lt;code&gt;xi &amp;lt; yj&lt;/code&gt; and &lt;code&gt;0&lt;/code&gt; otherwise: we
can add &lt;code&gt;t&lt;/code&gt; to &lt;code&gt;i&lt;/code&gt;, since &lt;code&gt;t=1&lt;/code&gt; if we just wrote &lt;code&gt;xi&lt;/code&gt; to &lt;code&gt;zs[k]&lt;/code&gt;. Then we can
&lt;code&gt;xor&lt;/code&gt; it with &lt;code&gt;1&lt;/code&gt;, effectively flipping &lt;code&gt;1&lt;/code&gt; to &lt;code&gt;0&lt;/code&gt; and &lt;code&gt;0&lt;/code&gt; to &lt;code&gt;1&lt;/code&gt;, and then add
&lt;code&gt;t^1&lt;/code&gt; to &lt;code&gt;j&lt;/code&gt;; this causes either &lt;code&gt;i&lt;/code&gt; or &lt;code&gt;j&lt;/code&gt; to be incremented but not both. We
used two convenience functions here, &lt;code&gt;one_if_lt&lt;/code&gt; and &lt;code&gt;min&lt;/code&gt;, both implemented
straight forward &lt;strong&gt;with branching&lt;/strong&gt;, hoping that the compiler will figure this
out for us, now that the branches are much smaller.&lt;/p&gt;
&lt;p&gt;Next, if we cheat a litte and assume that the highest bit in the numbers are
never set we can get rid of those branches&lt;sup&gt;&lt;a href=&quot;#user-content-fn-signed&quot; id=&quot;user-content-fnref-signed&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;void nonbranching(uint64_t *xs, size_t xmax, uint64_t *ys, size_t ymax, 
                  uint64_t *zs, size_t zmax) {
  size_t i = 0, j = 0, k = 0;
  uint64_t xi = xs[i], yj = ys[j];
  while ((i &amp;lt; xmax) &amp;amp;&amp;amp; (j &amp;lt; ymax) &amp;amp;&amp;amp; (k &amp;lt; zmax)) {
    uint64_t neg = (xi - yj) &amp;gt;&amp;gt; 63;
    yj = neg * xi + (1 - neg) * yj;
    zs[k] = yj;
    i += neg;
    xi = xs[i];
    neg ^= 1;
    j += neg;
    yj = ys[j];
    k += 1;
  }
  if (i == xmax)
    memcpy(zs + k, ys + j, 8 * (zmax - k));
  if (j == ymax)
    memcpy(zs + k, xs + i, 8 * (zmax - k));
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What is up with &lt;code&gt;(xi - yj) &amp;gt;&amp;gt; 63&lt;/code&gt; you may ask? This result is negative if &lt;code&gt;xi &amp;lt; yj&lt;/code&gt;, and so it will overflow and its most significant bit will be set.
Then we shift down logically (since we&apos;re using unsigned integers&lt;sup&gt;&lt;a href=&quot;#user-content-fn-arithshift&quot; id=&quot;user-content-fnref-arithshift&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;) so
the bits that are filled in are all zeroes. Since the width is 64, we effectively
move the upper bit to the lowest position while setting all other bits to zero.&lt;/p&gt;
&lt;p&gt;Knuth has another quirk, namely that his arrays usually points to the &lt;em&gt;end&lt;/em&gt; of
the array, and his indices are negative, going from &lt;code&gt;-xmax&lt;/code&gt; up to &lt;code&gt;0&lt;/code&gt; instead
of the more standard going from &lt;code&gt;0&lt;/code&gt; up to &lt;code&gt;xmax&lt;/code&gt;. One consequence of this is
that the termination check can be done with one comparison instead of three, by
&lt;code&gt;and&lt;/code&gt;ing together the three indices: since they are negative they have their
most significant bit set, unless zero. Here&apos;s both of the previous versions
with this reversal trick:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;void nonbranching_but_branching_reverse(uint64_t *xs, size_t xmax, 
                                        uint64_t *ys, size_t ymax, 
                                        uint64_t *zs, size_t zmax) {
  uint64_t *xse = xs + xmax;
  uint64_t *yse = ys + ymax;
  uint64_t *zse = zs + zmax;

  ssize_t i = -((ssize_t) xmax);
  ssize_t j = -((ssize_t) ymax);
  ssize_t k = -((ssize_t) zmax);

  uint64_t xi = xse[i], yj = yse[j];
  while (i &amp;amp; j &amp;amp; k) {
    uint64_t t = one_if_lt(xi - yj);
    yj = min(xi, yj);
    zse[k] = yj;
    i += t;
    xi = xse[i];
    t ^= 1;
    j += t;
    yj = yse[j];
    k += 1;
  }
  if (i == 0)
    memcpy(zse + k, yse + j, -8 * k);
  if (j == 0)
    memcpy(zse + k, xse + i, -8 * k);
}

void nonbranching_reverse(uint64_t *xs, size_t xmax, uint64_t *ys, size_t ymax, 
                          uint64_t *zs, size_t zmax) {
  uint64_t *xse = xs + xmax;
  uint64_t *yse = ys + ymax;
  uint64_t *zse = zs + zmax;

  ssize_t i = -((ssize_t) xmax);
  ssize_t j = -((ssize_t) ymax);
  ssize_t k = -((ssize_t) zmax);

  uint64_t xi = xse[i], yj = yse[j];
  while (i &amp;amp; j &amp;amp; k) {
    uint64_t neg = (xi - yj) &amp;gt;&amp;gt; 63;
    yj = neg * xi + (1 - neg) * yj;
    zse[k] = yj;
    i += neg;
    xi = xse[i];
    neg ^= 1;
    j += neg;
    yj = yse[j];
    k += 1;
  }
  if (i == 0)
    memcpy(zse + k, yse + j, -8 * k);
  if (j == 0)
    memcpy(zse + k, xse + i, -8 * k);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Technically, I suppose we do assume that the length of the
arrays are not &lt;code&gt;&amp;gt;2**63&lt;/code&gt;, so that they fit in an &lt;code&gt;ssize_t&lt;/code&gt;, but considering
that the address space of &lt;code&gt;x86-64&lt;/code&gt; is &lt;em&gt;not&lt;/em&gt; 64 bits, but &lt;em&gt;merely&lt;/em&gt; 48 bits&lt;sup&gt;&lt;a href=&quot;#user-content-fn-addrspace&quot; id=&quot;user-content-fnref-addrspace&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;,
this is not a problem, even in theory.&lt;/p&gt;
&lt;h2&gt;Writing the ASM ourselves&lt;/h2&gt;
&lt;p&gt;Lastly, we can try to write the assembly ourselves. When translating the
branch-free routine by Knuth into &lt;code&gt;x86&lt;/code&gt; there are a number of things to do.
First we need to figure out how to get &lt;code&gt;-1/0/+1&lt;/code&gt; by comparing two variables, as
&lt;code&gt;MMIX&lt;/code&gt;s &lt;code&gt;CMP&lt;/code&gt; instruction does. However, instead of trying to translate this
line by line, which would end up with us having more instructions than needed,
we should rather look more closely at what we&apos;re doing, so that we really
understand the minimal amount of work that we have to do.&lt;/p&gt;
&lt;p&gt;We only need to do two things: compare $x_i$ and $y_i$ and load the smaller
into a register, and increment either &lt;code&gt;i&lt;/code&gt; or &lt;code&gt;j&lt;/code&gt;. The former can be done using
&lt;code&gt;cmovl&lt;/code&gt;, and the latter can be done in a similar fasion as Knuth does it,
which is basically what we&apos;ve been doing up to this point in C.
This is the version I ended up with (here in inline-GCC asm format):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;1: mov   %[minxy], %[yj]                     ;
   cmp   %[xi], %[yj]                        ; minxy = min(xi, yj)
   cmovl %[minxy], %[xi]                     ;
   mov   QWORD PTR [%[zse]+8*%[k]], %[minxy] ; zs[k] = minxy
   mov   %[t], 0                             ; t = 0
   cmovl %[t], %[one]                        ; if xi &amp;lt; yj: t = 1
   add   %[i], %[t]                          ; i += t
   mov   %[xi], QWORD PTR [%[xse]+8*%[i]]    ; xi = xs[i]
   xor   %[t], 1                             ; t ^= 1
   add   %[j], %[t]                          ; j += t
   mov   %[yj], QWORD PTR [%[yse]+8*%[j]]    ; yj = ys[j]
   add   %[k], 1                             ; k += 1
   mov   %[u], %[i]                          ; 
   and   %[u], %[j]                          ;
   test  %[u], %[k]                          ; if ((i &amp;amp; j &amp;amp; k) != 0)
   jnz   1b                                  ;   goto 1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There&apos;s a few quirks here, like having a couple of &lt;code&gt;mov&lt;/code&gt; instructions in
between the second conditional load and the instruction it conditions on, and
the fact that &lt;code&gt;cmovl&lt;/code&gt; couldn&apos;t take an immediate value, so I had to setup a
register with only the value &lt;code&gt;1&lt;/code&gt; in it. A sneaky detail to keep in mind is that
when we set &lt;code&gt;t = 0&lt;/code&gt; we cannot use the trick of &lt;code&gt;xor&lt;/code&gt;ing &lt;code&gt;t&lt;/code&gt; with itself,
since this will change the flags, causing the subsequent &lt;code&gt;cmovl&lt;/code&gt; to be wrong.&lt;/p&gt;
&lt;p&gt;Now we can take a look at the assembly generated from some of the other
fuctions by using &lt;code&gt;objdump -d&lt;/code&gt;.
Our own programs are compiled with &lt;code&gt;-O3 -march=native&lt;/code&gt;.
Here is the inner loop in &lt;code&gt;nonbranching_reverse&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;nonbranching_reverse&amp;gt;:
1ef0:	mov    rax,rdi
1ef3:	sub    rax,rsi
1ef6:	shr    rax,0x3f
1efa:	mov    rdx,r8
1efd:	sub    rdx,rax
1f00:	imul   rdx,rsi
1f04:	imul   rdi,rax
1f08:	add    rbp,rax
1f0b:	xor    rax,0x1
1f0f:	add    rdi,rdx
1f12:	mov    QWORD PTR [r13+r12*8+0x0],rdi
1f17:	add    rcx,rax
1f1a:	inc    r12
1f1d:	mov    rax,rbp
1f20:	and    rax,r12
1f23:	mov    rdi,QWORD PTR [rbx+rbp*8]
1f27:	mov    rsi,QWORD PTR [r10+rcx*8]
1f2b:	test   rax,rcx
1f2e:	jne    1ef0 &amp;lt;nonbranching_reverse+0x40&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Sure looks a lot better than &lt;code&gt;branching&lt;/code&gt;!
This seems more or less reasonable, but we can see that the multiplication
trickery that we used to avoid the &lt;code&gt;min&lt;/code&gt; branch takes up some space here;
presumably it also takes some time. Maybe one little branch isn&apos;t too bad
though, and perhaps the compiler is more willingly to use conditional
instructions if we use the ternary operator, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;void nonbranching_reverse_ternary(uint64_t *xs, size_t xmax, uint64_t *ys, size_t ymax, 
                                  uint64_t *zs, size_t zmax) {
  uint64_t *xse = xs + xmax;
  uint64_t *yse = ys + ymax;
  uint64_t *zse = zs + zmax;

  ssize_t i = -((ssize_t) xmax);
  ssize_t j = -((ssize_t) ymax);
  ssize_t k = -((ssize_t) zmax);

  uint64_t xi = xse[i], yj = yse[j];
  while (i &amp;amp; j &amp;amp; k) {
    uint64_t ybig = (xi - yj) &amp;gt;&amp;gt; 63;
    yj = ybig ? xi : yj;
    zse[k] = yj;
    i += ybig;
    xi = xse[i];
    ybig ^= 1;
    j += ybig;
    yj = yse[j];
    k += 1;
  }
  if (i == 0)
    memcpy(zse + k, yse + j, -8 * k);
  if (j == 0)
    memcpy(zse + k, xse + i, -8 * k);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This time, if we look at the assembly, we can see that the compiler is finally getting it: &lt;code&gt;cmove&lt;/code&gt;!&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;2080:	mov    rax,yj                     ;
2083:	sub    rax,xi                     ;
2086:	shr    rax,0x3f                   ; t = (yj - xi) &amp;gt;&amp;gt; 63
208a:	cmove  yj,xi                      ; yj = t == 0 ? xi : yj
208e:	add    j,rax                      ; j += t
2091:	mov    QWORD PTR [zs+k*8],yj      ; z[k] = yj
2096:	xor    rax,0x1                    ; t ^= 1
209a:	inc    k                          ; k++
209d:	add    i,rax                      ; i += t
20a0:	mov    rax,k                      ; 
20a3:	and    rax,j                      ; t = k &amp;amp; j
20a6:	mov    yj,QWORD PTR [ys+j*8]      ; yj = ys[j]
20aa:	mov    xi,QWORD PTR [xs+i*8]      ; xi = xs[i]
20ae:	test   rax,i                      ; if ((i &amp;amp; j &amp;amp; k) != 0)
20b1:	jne    2080                       ; goto .2080
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So we see it&apos;s really the same! Curiously, the compiler turned our code around
to have &lt;code&gt;t&lt;/code&gt; be &lt;code&gt;1&lt;/code&gt; if &lt;code&gt;xi&lt;/code&gt; was the bigger, whereas our &lt;code&gt;ybig&lt;/code&gt; was &lt;code&gt;1&lt;/code&gt; if &lt;code&gt;yj&lt;/code&gt;
was the bigger.&lt;/p&gt;
&lt;h2&gt;Results&lt;/h2&gt;
&lt;p&gt;And now for the results! We fill two arrays with random elements and run
&lt;code&gt;branching&lt;/code&gt; on it, such that we get the merged array back. This is used as the
ground truth which all other variations are checked agaist, in case we have
messed up. Then we use &lt;code&gt;clock_gettime&lt;/code&gt; to measure the wall clock time that we
spend, per method. The following is running time in milliseconds where both
lists are &lt;code&gt;2**25&lt;/code&gt; elements long, averaged over 100 runs; 10 iterations per seed
and 10 different seeds (&lt;code&gt;srand(i)&lt;/code&gt; for each iteration).&lt;/p&gt;
&lt;p&gt;These are the numbers I got on a Intel i7-7500U@2.7GHz (&lt;code&gt;avg +/- var&lt;/code&gt;):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;branching:                          30.998 +/- 0.001
nonbranching_but_branching:         27.330 +/- 0.002
nonbranching:                       24.770 +/- 0.000
nonbranching_but_branching_reverse: 19.387 +/- 0.000
nonbranching_reverse:               20.015 +/- 0.000
nonbranching_reverse_ternary:       19.038 +/- 0.000
asm_nb_rev:                         18.987 +/- 0.001
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I also ran the suite on another machine with a
Intel i5-8250U@1.60GHz, in order to see if there would be any significant difference:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;branching:                          31.405 +/- 0.034
nonbranching_but_branching:         27.646 +/- 0.097
nonbranching:                       27.894 +/- 0.021
nonbranching_but_branching_reverse: 22.760 +/- 0.040
nonbranching_reverse:               21.284 +/- 0.050
nonbranching_reverse_ternary:       19.299 +/- 0.002
asm_nb_rev:                         19.793 +/- 0.009
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Interestingly, on this CPU our assembly is slightly slower than the ternary
version; I guess this is due to us using a &lt;code&gt;cmovl&lt;/code&gt; where the compiler generated
version used the shifting trick.&lt;/p&gt;
&lt;h2&gt;Bonus: Sorting&lt;/h2&gt;
&lt;p&gt;We can&apos;t possibly have done all this merging without making a proper
&lt;code&gt;mergesort&lt;/code&gt; in the end! Luckily for us, the &lt;code&gt;merge&lt;/code&gt; part is really the
only difficult part of the routine:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;void merge_sort(uint64_t *xs, size_t n, uint64_t *buf) {
  if (n &amp;lt; 2) return;
  size_t h = n / 2;
  merge_sort(xs, h, buf);
  merge_sort(xs + h, n - h, buf + h);
  merge(xs, h, xs + h, n - h, buf, n);
  memcpy(xs, buf, 8 * n);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Unfortunately we have to merge to a buffer and then &lt;code&gt;memcpy&lt;/code&gt; it back. Perhaps
this is fixable: we can make the sorting routine either put the result in &lt;code&gt;xs&lt;/code&gt;
or in &lt;code&gt;buf&lt;/code&gt;, and by having the recursive calls say which we can merge into the
other, assuming both recursive calls agree(!!&lt;sup&gt;&lt;a href=&quot;#user-content-fn-balanced&quot; id=&quot;user-content-fnref-balanced&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;). That is, if the
recursive calls say that the sorted subarrays are in &lt;code&gt;xs&lt;/code&gt;, we merge into &lt;code&gt;buf&lt;/code&gt;
and tell our caller that &lt;code&gt;our&lt;/code&gt; result is in &lt;code&gt;buf&lt;/code&gt;. At the end, we just need to
make sure that the final sorted numbers are in &lt;code&gt;xs&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;void _sort_asm(uint64_t *xs, size_t n, uint64_t *buf, int *into_buf) {
  if (n &amp;lt; 2) {
    *into_buf = 0;
    return;
  }
  size_t h = n / 2;
  int res_in_buf;
  _sort_asm(xs, h, buf, &amp;amp;res_in_buf); // WARNING: `res_in_buf` for the two calls is needs
  _sort_asm(xs + h, n - h, buf + h, &amp;amp;res_in_buf); // not be the same in the real world!
  *into_buf = res_in_buf ^ 1;
  if (res_in_buf)
    asm_nb_rev(buf, h, buf + h, n - h, xs, n);
  else
    asm_nb_rev(xs, h, xs + h, n - h, buf, n);
}

void sort_asm(uint64_t *xs, size_t n, uint64_t *buf) {
  int res_in_buf;
  _sort_asm(xs, n, buf, &amp;amp;res_in_buf);
  if (res_in_buf) {
    memcpy(xs, buf, 8 * n);
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and similar, for the other variants.
You might see the branch and wonder if we can remove it --- I tried, by making
an array &lt;code&gt;{xs, buf}&lt;/code&gt; and index it with &lt;code&gt;res_in_buf&lt;/code&gt;, but it caused a minor
slowdown: maybe some branching is fine after all.&lt;/p&gt;
&lt;p&gt;Here are the running times:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;                                         i7-7500U              i5-8250U
sort_branching:                          369.479 +/- 0.047     393.762 +/- 0.082
sort_nonbranching_but_branching:         324.337 +/- 0.014     337.120 +/- 0.099
sort_nonbranching:                       325.658 +/- 0.028     352.802 +/- 0.120
sort_nonbranching_but_branching_reverse: 279.237 +/- 0.164     287.799 +/- 0.154
sort_nonbranching_reverse:               283.927 +/- 0.033     299.277 +/- 0.929
sort_nonbranching_reverse_ternary:       270.668 +/- 0.009     278.644 +/- 1.677
sort_asm_nb_rev:                         270.228 +/- 0.009     281.657 +/- 0.360
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you would like to run the suite yourself, the git repo is &lt;a href=&quot;https://git.sr.ht/~mht/merge-asm&quot;&gt;avaiable here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-kinc&quot;&gt;
&lt;p&gt;Originally I had omitted the &lt;code&gt;_done&lt;/code&gt; parts, and the code was much cleaner, and I&apos;m not sure why having it in complicates this that much. Also, why is &lt;code&gt;k&lt;/code&gt; incremented before storing &lt;code&gt;zs[k]&lt;/code&gt; so that we have to store &lt;code&gt;zs[k-1]&lt;/code&gt; instead? &lt;a href=&quot;#user-content-fnref-kinc&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-signed&quot;&gt;
&lt;p&gt;Curiously, if we change from &lt;code&gt;uint64_t&lt;/code&gt; to &lt;code&gt;int64_t&lt;/code&gt; and use &lt;code&gt;((a-b)&amp;gt;&amp;gt;63)&amp;amp;1&lt;/code&gt; for the test we do not depend on the magnitudes of the numbers (as the compiler can assume signed overflow will not happen); also the &lt;code&gt;and&lt;/code&gt; never makes it to the assembly, and we still use logical instead of arithmetic shift. &lt;a href=&quot;#user-content-fnref-signed&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-arithshift&quot;&gt;
&lt;p&gt;The alternative is &lt;em&gt;arithmetic shift&lt;/em&gt; in which the sign bit is propagated down. In this case we would end up with either all zeroes or all ones. &lt;a href=&quot;#user-content-fnref-arithshift&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-addrspace&quot;&gt;
&lt;p&gt;https://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details &lt;a href=&quot;#user-content-fnref-addrspace&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-balanced&quot;&gt;
&lt;p&gt;This is really only the case if &lt;code&gt;n&lt;/code&gt; is a power of two: otherwise you&apos;ll have two siblings in the call tree with different &lt;code&gt;n&lt;/code&gt;s, and this difference will cause two leaf nodes to be at different depths, which in turn will make them &amp;quot;out of sync&amp;quot;. &lt;a href=&quot;#user-content-fnref-balanced&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Mathematica&apos;s Scoping is Weird</title><id>https://mht.wtf/post/mathematica-block/</id><updated>2021-12-30T23:30:11+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/mathematica-block/" rel=""/><link href="https://mht.wtf/post/mathematica-block/index.html" rel="alternate"/><published>2021-12-30T23:30:11+01:00</published><content type="text/html">&lt;p&gt;I&apos;ve been using Mathematica a little bit in the past few weeks to do some simple plotting and symbolic manipulation of equations.
It&apos;s &lt;em&gt;okay&lt;/em&gt;; I keep running into weird behavior and getting funny errors that I assume
more seasoned Mathematica users would not get. Here&apos;s one of them.&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;With&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Mathematica has weird scoping rules. For instance, there&apos;s a thing called &lt;code&gt;With&lt;/code&gt; that let&apos;s you assign values to
variables in some expression and then have these values be replaced in that expression.
It feels similar to a regular block in C-like languages.
It looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-Mathematica&quot;&gt;In[1]:= With[{x=1}, x+1]
Out[1]= 2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;No surprise so far, since &lt;code&gt;1 + 1 == 2&lt;/code&gt;. However, what happens if you make a new variable in the expression in a &lt;code&gt;With&lt;/code&gt;?&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-Mathematica&quot;&gt;In[2]:= With[{}, inner=1]
Out[2]= 1
In[3]:= inner
Out[3]= 1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Okay, so &lt;code&gt;inner&lt;/code&gt; has now leaked out to the global scope. Annoying, since it might be difficult to avoid
having symbols leak out of your scope, but maybe it&apos;s not so bad.&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;Block&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Another similar form to &lt;code&gt;With&lt;/code&gt; is &lt;code&gt;Block&lt;/code&gt;, which is used for dynamic scoped variables.
Assume we have a bunch of values that we don&apos;t want to keep passing around all functions that we use.
For instance we can have a function that just adds its argument to some &amp;quot;global&amp;quot; symbol:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-Mathematica&quot;&gt;In[1]:= addX[a_]:=a + x
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here &lt;code&gt;x&lt;/code&gt; is a free variable in the function &lt;code&gt;addX&lt;/code&gt;. We can evaluate the function and assign a value to &lt;code&gt;a&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-Mathematica&quot;&gt;In[2]:= addX[12]
Out[2]= 12+x
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can also define &lt;code&gt;x&lt;/code&gt; to be some value.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-Mathematica&quot;&gt;In[3]:= x=3;
        addX[12]
Out[4]= 15
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Maybe we would like to evaluate &lt;code&gt;addX&lt;/code&gt; but use a different temporary value for &lt;code&gt;x&lt;/code&gt;. We can use &lt;code&gt;Block&lt;/code&gt; for this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-Mathematica&quot;&gt;In[5]:= Block[{x=10}, addX[10]]
Out[5]= 20
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This does not change the value of &lt;code&gt;x&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-Mathematica&quot;&gt;In[6]:= x
Out[6]= 3
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using the other construct from the beginning, &lt;code&gt;With&lt;/code&gt;, does not work the same way,
since the &lt;code&gt;addX&lt;/code&gt; function will already look up the global value of &lt;code&gt;x&lt;/code&gt; when it is evaluated.
In a sense, &lt;code&gt;Block&lt;/code&gt; makes references to &lt;code&gt;x&lt;/code&gt; give higher precedence to the &lt;code&gt;Block&lt;/code&gt; value instead of the global value.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-Mathematica&quot;&gt;In[7]:= With[{x=10}, addX[10]]
Out[7]= 13
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This might be surprising, but hey, different constructs for different semantics;
presumably there are times when you&apos;d want &lt;code&gt;With&lt;/code&gt; semantics and other times when you want &lt;code&gt;Block&lt;/code&gt; semantics.&lt;/p&gt;
&lt;p&gt;Another problem arise when we have an old variable still in the notebook, maybe introduced from a scope you thought was local.
Consider the following&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-Mathematica&quot;&gt;In[8]:= Block[{y=10, x=y+10}, addX[10]]
Out[8]= 30
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So far all is well; &lt;code&gt;y&lt;/code&gt; is 10, &lt;code&gt;x&lt;/code&gt; is 20, and we add 10 to &lt;code&gt;x&lt;/code&gt; which gives us 30.
What happens now if we add a global variable named &lt;code&gt;y&lt;/code&gt;?&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-Mathematica&quot;&gt;In[9]:= y=0;
        Block[{y=10, x=y+10}, addX[10]]
Out[10]= 20
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We get a different answer!
It turns out that when &lt;code&gt;Block&lt;/code&gt; evaluates its arguments, it does so without binding the values it creates as dynamic,
so the evaluation of &lt;code&gt;x=y+10&lt;/code&gt; does not use the newly made variable &lt;code&gt;y&lt;/code&gt; but rather the global value &lt;code&gt;0&lt;/code&gt;.
Unless, of course, &lt;code&gt;y&lt;/code&gt; has no value yet, in which, I guess at a later stage, it is bound to the value introduced by &lt;code&gt;Block&lt;/code&gt;.
Makes sense? No?&lt;/p&gt;
&lt;p&gt;Presumably this is documented somewhere in the language specification, if you just know where to look and exactly how the language works.
But man, this is not intuitive.&lt;/p&gt;
&lt;p&gt;Pointers, complaints, suggestions, and your bitcoin wallet, can be sent to &lt;a href=&quot;mailto:https://lists.sr.ht/~mht/public-inbox&quot;&gt;my public inbox&lt;/a&gt; (plain text emails only).&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
</content></entry><entry><title>AIcohol</title><id>https://mht.wtf/post/aicohol/</id><updated>2025-07-06T22:24:00+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/aicohol/" rel=""/><link href="https://mht.wtf/post/aicohol/index.html" rel="alternate"/><published>2025-07-06T22:24:00+02:00</published><content type="text/html">&lt;p&gt;The ethics&lt;sup&gt;&lt;a href=&quot;#user-content-fn-e&quot; id=&quot;user-content-fnref-e&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; of LLMs is a contentious topic.
I think this is mainly because of the current reach of the technology (it&apos;s everywhere),
the hype (people claim it can do anything),
and that it&apos;s succesful (it&apos;s hard to ignore).
Unfortunately, a lot of things are mixed into disussions around the ethics of LLMs,
and so I&apos;ve struggeled with figuring out what I think.&lt;/p&gt;
&lt;p&gt;Here&apos;s my personal take on the ethics of LLMs, as of today.&lt;/p&gt;
&lt;h2&gt;LLMs are like alcohol&lt;/h2&gt;
&lt;p&gt;Alcohol is poison.
I don&apos;t need to list out &lt;em&gt;all&lt;/em&gt; of the bad things that happen in which alcohol is involved,
but here&apos;s a couple of them:
you can mess up your own life through addiction;
you can destroy your family through alcohol-induced abuse or violence;
you can kill random people by drunk driving;
you can put increased load on your society due to reduced general health.&lt;/p&gt;
&lt;p&gt;Still, alcohol is a part of my life, and I enjoy both beer, wine, liquor, and drinks, and
I do feel conflicted about that.
If I buy a glass of wine at a restaurant, am I contributing to people being killed in traffic by drunk drivers?
I think I am, if only by an epsilon amount.
I don&apos;t, however, feel &lt;em&gt;responsible&lt;/em&gt; for drunk drivers, abusive drunk parents, or addicts who drink themselves to death.&lt;/p&gt;
&lt;p&gt;Here are some reasons why LLM ethics is difficult:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Power usage.  LLMs require a lot of power, and we need less fossil energy.&lt;/li&gt;
&lt;li&gt;Copyright.  The situation with LLMs and copyright is not clear, and until it is resolved, Big Tech is stomping on Starving Artist.&lt;/li&gt;
&lt;li&gt;Slop.  Pretending LLM slop is humanly created can be a breach of social contract.&lt;/li&gt;
&lt;li&gt;Deceit.  LLMs can enable deceit at a large scale, for instance with deepfakes or other impersonation, and do it automatically.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&apos;s probably a lot more.
I think all of these are valid concerns, and I do feel conflicted by my
limited usage of LLMs. At the same time, I don&apos;t feel responsible for
LLM-induced power price spikes or coal emissions, copyright infringement, bad
summaries or software bugs, or deceit, because &lt;em&gt;I&lt;/em&gt; used an LLM to do something
else entirely. These are shitty things to be happening and we need to work to
reduce or stop them, but I don&apos;t think auto-completing code a couple times a
day using Claude makes me responsible for the slopification.&lt;/p&gt;
&lt;p&gt;LLMs deceive, alcohol destroys.
I&apos;m still excited by going to a new wine bar,
and I&apos;ll continue to causiously use LLMs to write code in the hopes that someday it&apos;ll actually save me time.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-e&quot;&gt;
&lt;p&gt;... or lack thereof, am I right??! &lt;a href=&quot;#user-content-fnref-e&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Simplicity as a Value</title><id>https://mht.wtf/post/simplicity/</id><updated>2023-11-11T12:00:17+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/simplicity/" rel=""/><link href="https://mht.wtf/post/simplicity/index.html" rel="alternate"/><published>2023-11-11T12:00:17+01:00</published><content type="text/html">&lt;p&gt;I used to think that simplicity is good because of the other values it often brings:
a simple system is easier to write;
a simple system is easier to build;
a simple system is more portable;
a simple system is easier to debug and reason about;
a simple system performs well;
a simple system is easier to change.
The word &amp;quot;simple&amp;quot; does some heavy lifting here, and I found that I would often use these as a metric for &lt;em&gt;if&lt;/em&gt; something was simple or not.
In other words, these weren&apos;t the result of simplicity, they were the definition.&lt;/p&gt;
&lt;p&gt;I have since found that I don&apos;t actually need any of these values to be true for me to value simplicity.
It&apos;s not that I don&apos;t care about how easy the system is to debug, how easy it is to extend, or how fast it runs, but all of these are universally good qualities.
Nobody would prefer a system that&apos;s hard to debug over one that is easy to debug, all other things equal.
I still think that simplicity very often brings these advantages&lt;sup&gt;&lt;a href=&quot;#user-content-fn-simple&quot; id=&quot;user-content-fnref-simple&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, but among the advantages is simplicity itself.&lt;/p&gt;
&lt;p&gt;My definition of simple has also changed a lot over the years.
When I was starting out, Java&apos;s &lt;code&gt;ArrayList&lt;/code&gt; felt simple, but handling arrays&lt;sup&gt;&lt;a href=&quot;#user-content-fn-array&quot; id=&quot;user-content-fnref-array&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; felt awkward and complicated; &amp;quot;buffer&amp;quot; was a scary word.
When I learned Python, it&apos;s syntax was simple, but requiring the python interpreter was not&lt;sup&gt;&lt;a href=&quot;#user-content-fn-py&quot; id=&quot;user-content-fnref-py&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;; I couldn&apos;t just copy over my program to another computer and run it there.
When I read &lt;a href=&quot;https://en.wikipedia.org/wiki/The_C_Programming_Language&quot;&gt;K&amp;amp;R&lt;/a&gt;, C felt very simple, but when I tried to even build some C projects I found in the wild, my experience was very different&lt;sup&gt;&lt;a href=&quot;#user-content-fn-c&quot; id=&quot;user-content-fnref-c&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.
These days I&apos;m mostly excited about low(er) languages, like &lt;a href=&quot;https://www.rust-lang.org/&quot;&gt;Rust&lt;/a&gt;, &lt;a href=&quot;https://ziglang.org/&quot;&gt;Zig&lt;/a&gt;, and &lt;a href=&quot;https://harelang.org/&quot;&gt;Hare&lt;/a&gt;;
programming in these languages feels simple because of how they map to my mental model of my computer.
The code that I write that I&apos;m happiest about is the &lt;a href=&quot;https://mht.wtf/post/flow/&quot;&gt;simple code&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I guess I simply value simplicity.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-simple&quot;&gt;
&lt;p&gt;Further, I think it often gives an 80% solution for &lt;strong&gt;all&lt;/strong&gt; of these values. If you want to really maximize, say, speed, readability, debuggability, portability, and all the other -abilites &lt;strong&gt;will&lt;/strong&gt; suffer. &lt;a href=&quot;#user-content-fnref-simple&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-array&quot;&gt;
&lt;p&gt;A sidenote: when learning about arrays, I could not for the life of me see any use-case for it. &lt;code&gt;ArrayList&lt;/code&gt; made perfect sense, since it was a list you could put stuff in, but I did not see the value of having a fixed-size list of things. Tangentially, I also wanted to string interpolate myself a variable; with &lt;code&gt;int a1, a2, a3; int i = 3;&lt;/code&gt; I wanted to be able to write &lt;code&gt;a3 = 0;&lt;/code&gt; as &lt;code&gt;a{i} = 0;&lt;/code&gt;. I don&apos;t remember how long it took to see the connection. &lt;a href=&quot;#user-content-fnref-array&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-py&quot;&gt;
&lt;p&gt;In hindsight, it is curious that my biggest problem with Python was that I needed the interpreter as opposed to just being able to copy a &lt;code&gt;.exe&lt;/code&gt;, when this is also true for Java, my first language. For some reason, the lack of ahead-of-time compilation made a very big difference for me. &lt;a href=&quot;#user-content-fnref-py&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-c&quot;&gt;
&lt;p&gt;Here&apos;s some keywords: &lt;a href=&quot;https://www.gnu.org/software/autoconf/&quot;&gt;autoconf&lt;/a&gt;, &lt;a href=&quot;https://www.gnu.org/software/libc/&quot;&gt;glibc&lt;/a&gt;, &lt;a href=&quot;https://cmake.org/&quot;&gt;cmake&lt;/a&gt;, dependency management, dynamic libraries, macros. &lt;a href=&quot;#user-content-fnref-c&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Another Static Site</title><id>https://mht.wtf/post/static-site/</id><updated>2024-04-02T22:51:56+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/static-site/" rel=""/><link href="https://mht.wtf/post/static-site/index.html" rel="alternate"/><published>2024-04-02T22:51:56+01:00</published><content type="text/html">&lt;p&gt;In January 2016 I moved this website from &lt;a href=&quot;https://jekyllrb.com/&quot;&gt;Jekyll&lt;/a&gt; to &lt;a href=&quot;https://gohugo.io/&quot;&gt;Hugo&lt;/a&gt;.
The motivation was to make deployments easier, since Hugo was a single static binary, and Jekyll is not.
Over the years, I did a minimal amount of work on the website itself, and as Hugo kept changing, warnings kept piling up every time I, for whatever reason, updated it.
In addition, the few times I did want to change something to the site, I inevitabely got lost in the Hugo docs.
It became increasingly clear that Hugo was not made for my use-case, and so I wanted to migrate off of it.&lt;/p&gt;
&lt;p&gt;This easter I decided to bite the bullet and try something else.
I spent an afternoon trying to set up &lt;a href=&quot;https://cobalt-org.github.io/&quot;&gt;Cobalt&lt;/a&gt;,
followed by &lt;a href=&quot;https://www.getzola.org/&quot;&gt;Zola&lt;/a&gt;, but they both felt too complex for me.
So instead, I decided to write my own, and after a few days I have it all set up:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Markdown parsing with &lt;a href=&quot;https://github.com/wooorm/markdown-rs&quot;&gt;&lt;code&gt;markdown-rs&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Templating with &lt;a href=&quot;https://keats.github.io/tera/docs/&quot;&gt;Tera&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Atom_(web_standard)&quot;&gt;Atom&lt;/a&gt; feed generation with &lt;a href=&quot;https://github.com/rust-syndication/atom&quot;&gt;&lt;code&gt;atom&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Syntax highlighting with &lt;a href=&quot;https://prismjs.com/&quot;&gt;Prism&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Pretty math with &lt;a href=&quot;https://katex.org/&quot;&gt;KaTeX&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Rewrote most of the CSS, and added dark mode&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I tried to keep things simple, and I&apos;m pretty happy with the current state.
The code is in one file and is around 450 lines of code.
It reads a directory structure like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;mht.wtf/
├── pages # Markdown files that are templated.  Directory stucture is kept.
│  ├── index.md # This turns into https://mht.wtf/index.html
│  ├── painting
│  │  └── index.md # ... and this to https://mht.wtf/painting/index.html
│  └── post
│     ├── index.md # This is the page with the list of blog posts
│     ├── flow
│     │  └── index.md # https://mht.wtf/post/flow/
│     └── static-site
│        └── index.md # ... and so on
├── publish.sh # Convenience script to build and `fsync` to the server.
├── README.md
├── static # These files are copied to the output folder.
│  ├── iosevka.css
│  ├── post
│  │  └── flow
│  │     ├── bipartite.svg
│  │     ├── flow-graph.svg
│  │     ├── route-connect.svg
│  │     └── route.svg
│  └── style.css
└── templates # Tera templates referenced by the files in pages/
   ├── blog-post.html
   ├── blog.html
   ├── cc-by-sa.html
   └── index.html
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;pages&lt;/code&gt; directory contains all files that will be transformed to &lt;code&gt;html&lt;/code&gt; files in exactly the same directory structure.
For every markdown file, the template to use is specified in the front matter.
The &lt;code&gt;static&lt;/code&gt; directory contains files that should be copied as-is, like &lt;code&gt;css&lt;/code&gt;, fonts, or &lt;code&gt;svg&lt;/code&gt;s and other assets for specific pages.
For instance, the blog post &lt;code&gt;flow&lt;/code&gt; is located at &lt;code&gt;pages/post/flow/index.md&lt;/code&gt; and its pictures are e.g. at &lt;code&gt;static/post/flow/bipartite.svg&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;Javascript&lt;/h3&gt;
&lt;p&gt;There are two sources to Javascript in these blog posts: syntax highlighting and math typesetting.&lt;/p&gt;
&lt;aside class=&quot;span-2&quot;&gt;
    I&apos;d like to do syntax highlighting as a preprocessing step instead of doing it at runtime
    since I don&apos;t need dynamic highlighting. Maybe next time.
&lt;/aside&gt;
&lt;p&gt;In Hugo, I had to manually mark blog posts as &lt;code&gt;mathy&lt;/code&gt; so that I could include MathJax in the &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; of the template.
Initially I ported over the same system here, but I realized that that&apos;s only busywork when I have written the generator myself.
Now I look for a &lt;code&gt;$&lt;/code&gt; in the Markdown text, and if &lt;code&gt;katex&lt;/code&gt; is not explicitly set in the front matter, I set it to &lt;code&gt;true&lt;/code&gt;.
This way I don&apos;t need to specify anywhere that I am using for math, I can just use it. Blog posts that don&apos;t use it doesn&apos;t include it,
and for false positives, &lt;code&gt;katex = false&lt;/code&gt; will opt-out.&lt;/p&gt;
&lt;p&gt;I do the same with Prism; if I have a code block with a language specified, like &lt;code&gt; ```rust&lt;/code&gt; I include Prism, unless &lt;code&gt;prism = false&lt;/code&gt; is in the front matter.&lt;/p&gt;
&lt;h3&gt;Templating&lt;/h3&gt;
&lt;p&gt;I wanted to be able to write markdown and produce HTML, and so one way or another I needed a way of specifying what that HTML should look like.
Templates seemed like the least complicated but still powerful enough solution for this.
I am not using anything fancy with templates though, it&apos;s pretty much accessing fields from the front matter (e.g. &lt;code&gt;katex&lt;/code&gt; or &lt;code&gt;date&lt;/code&gt;), and formatting the date.&lt;/p&gt;
&lt;p&gt;There was one catch however, namely listing the blog posts.&lt;/p&gt;
&lt;p&gt;My plan was to read in the directory structure and pass that to the template, but this made it difficult to write out the template, because&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The unique identifier (&lt;code&gt;static-site&lt;/code&gt; for this post, &lt;code&gt;flow&lt;/code&gt; for the &lt;a href=&quot;https://mht.wtf/post/flow/&quot;&gt;Flow post&lt;/a&gt;) is in the directory name, and not the front matter.&lt;/li&gt;
&lt;li&gt;I wanted to sort the posts based on a &lt;code&gt;date&lt;/code&gt;, which &lt;em&gt;is&lt;/em&gt; in the front matter.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;index.md&lt;/code&gt; should be skipped when on the first level.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Tera, like most templaing languages, isn&apos;t a joy to use, and so simple data transformations like this turned out to be difficult.
However, it has a nice escape hatch in which you can write a Rust function and call &lt;a href=&quot;https://docs.rs/tera/latest/tera/struct.Tera.html#method.register_function&quot;&gt;&lt;code&gt;register_function&lt;/code&gt;&lt;/a&gt; to make it callable in the template.
That way you can do whatever transformation you want in Rust instead.
Convenient, if not pretty.&lt;/p&gt;
&lt;aside class=&quot;span-2&quot;&gt;
    Done again, I would seriously consider not having any templating at all, and find a nice way of writing the HTML in Rust instead, and embrace
    the fact that the executable would probably only be ever used by me to make this site.
&lt;/aside&gt;
&lt;h3&gt;Others&lt;/h3&gt;
&lt;p&gt;Instead of writing an HTTP server to serve the files when writing and reloading when any of the files change,
I used &lt;code&gt;python -m http.server&lt;/code&gt; and &lt;a href=&quot;https://github.com/watchexec/watchexec&quot;&gt;watchexec&lt;/a&gt;. Maybe there&apos;s a nice
&amp;quot;hot-reload simple serve http server&amp;quot; out there that would do both for me, but this setup was very low friction.
I have to reload the page myself though, but since I mainly write Markdown anyways there&apos;s no real reason to have
the page update live.&lt;/p&gt;
&lt;p&gt;Rewriting the page was also a good excuse to have another look at the CSS, and with it, some nice positioning for &lt;code&gt;&amp;lt;aside&amp;gt;&lt;/code&gt; elements, when space allows for it.
These are the gray margin notes you can see above.
They are positioned with CSS grid, using named columns, and with a &lt;code&gt;@media&lt;/code&gt; query for narrow screens to place it back in the regular flow.
&lt;code&gt;&amp;lt;code&amp;gt;&lt;/code&gt; is also highlighted almost like in my editor now, with mostly white on dark, not too many colors, and bright yellow comments.
I&apos;m still not 100% happy with the spacing around certain elements, but it&apos;s okay.&lt;/p&gt;
</content></entry><entry><title>ppl</title><id>https://mht.wtf/post/ppl/</id><updated>2025-03-11T22:36:35+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/ppl/" rel=""/><link href="https://mht.wtf/post/ppl/index.html" rel="alternate"/><published>2025-03-11T22:36:35+02:00</published><content type="text/html">&lt;p&gt;&lt;code&gt;ppl&lt;/code&gt; is a small webapp for storing contacts.
It is the first result of me wanting to &lt;a href=&quot;/post/produce/&quot;&gt;produce more and consume less&lt;/a&gt;.
It is a docker image for which you mount a sqlite database,
and it runs a single-user password protected web server that you interact with in a browser.
I don&apos;t know if I&apos;ll release it for others to use; I built it for myself,
like a &lt;a href=&quot;https://www.youtube.com/watch?v=qo5m92-9_QI&quot;&gt;home-cooked meal&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is how it looks with dummy data:&lt;/p&gt;
&lt;figure style=&quot;display: flex; gap: 2rem&quot;&gt;
  &lt;div style=&quot;width: 100%&quot;&gt;
    &lt;img src=&quot;./ppl.png&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
    &lt;figcaption&gt;Front page of &lt;code&gt;ppl&lt;/code&gt;.&lt;/figcaption&gt;
  &lt;/div&gt;
  &lt;div style=&quot;width: 100%&quot;&gt;
    &lt;img src=&quot;./peep.png&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
    &lt;figcaption&gt;Detail view of one &lt;emph&gt;peep&lt;/emph&gt;.&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;aside&gt;
    &lt;emph&gt;peep&lt;/emph&gt; is the name for a user in &lt;code&gt;ppl&lt;/code&gt;.
    &quot;User&quot; could be confused with the user of &lt;code&gt;ppl&lt;/code&gt; (me), and 
    &quot;person&quot; sounded too formal.
&lt;/aside&gt;
&lt;p&gt;The current feature-set is small:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CRUD operations for peep&lt;/li&gt;
&lt;li&gt;Name search&lt;/li&gt;
&lt;li&gt;Some keyboard navigation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That&apos;s it!
In the near future I want to include a upcoming birthday calendar, better
keyboard controls, and some import/export features, and maybe, just maybe,
multi-user support so that I can share it with my closest family.&lt;/p&gt;
&lt;p&gt;Other potential features include&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Saved change history&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Getting a tiny LLM (TLM?) to handle import&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;More eye candy -- icons, polish, some animations&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Why&lt;/h2&gt;
&lt;p&gt;I built &lt;code&gt;ppl&lt;/code&gt; because I wanted to make something that I knew I could finish.
This year I&apos;ve already started on a live-collab-quiz-style game (think &lt;a href=&quot;https://www.jackboxgames.com/games/drawful&quot;&gt;Drawful&lt;/a&gt; but different)
and some kind of frequency analysis for guitar playing, with the goal of automatically transcribing my own playing.
Both projects were too big to complete in any sense of the word.&lt;/p&gt;
&lt;p&gt;Another reason was that I&apos;ve never had &amp;quot;contacts software&amp;quot; that I &lt;em&gt;like&lt;/em&gt;.
Building good things is hard, but since the scope of a contacts app is necessarily very small,
I have time, patience, and attention to build it and to do it &lt;em&gt;well&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;It&apos;s a webapp; I&apos;m not crazy about the entire web-stack, and I get my fair
share of it at work.  However, I needed this to be accessible from all of my
devices.  It seems that the web is the only viable choice for this.&lt;/p&gt;
&lt;h2&gt;How&lt;/h2&gt;
&lt;p&gt;This is my current go-to stack for anything web-related, and it works well for my use-case.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ppl&lt;/code&gt; is a single-executable Rust binary running an &lt;code&gt;axum&lt;/code&gt; web-server serving
&lt;code&gt;HTMX&lt;/code&gt; templated with &lt;code&gt;maud&lt;/code&gt;. It reads and writes to an &lt;code&gt;sqlite&lt;/code&gt; database with
&lt;code&gt;sqlx&lt;/code&gt;.  I wrote the &lt;code&gt;css&lt;/code&gt; from scratch, and use a tiny bit of &lt;code&gt;js&lt;/code&gt;, mainly for
input handling for the search bar (arrow keys for navigation, for instance),
and some in my bespoke hot-reloading system so that I can edit static files and
have them update live in the browser (which is compiled out in &lt;code&gt;--release&lt;/code&gt;
mode). Useful for pushing pixels!&lt;/p&gt;
&lt;p&gt;It&apos;s built in a Docker container with &lt;code&gt;cargo-chef&lt;/code&gt; for dependency caching (of
which this stack has its fair amount!). The image is pushed to a Hetzner
server, and the container is manually restarted. The two operations is
concatenated in the &lt;code&gt;deploy&lt;/code&gt; rule of my &lt;code&gt;justfile&lt;/code&gt;, so for me it&apos;s one command
to build and deploy.&lt;/p&gt;
&lt;p&gt;Getting from &lt;code&gt;git init&lt;/code&gt; to a minimal version deployed on the internet took around two evenings of hacking.&lt;/p&gt;
&lt;p&gt;It&apos;s been fun hacking on a small tool that I find useful and that&apos;s &lt;em&gt;mine&lt;/em&gt;.
I hope to do more of it!&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
</content></entry><entry><title>Content Aware Image Resize</title><id>https://mht.wtf/post/content-aware-resize/</id><updated>2017-02-13T17:30:00+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/content-aware-resize/" rel=""/><link href="https://mht.wtf/post/content-aware-resize/index.html" rel="alternate"/><published>2017-02-13T17:30:00+01:00</published><content type="text/html">&lt;p&gt;Content aware image resizing, liquid image resizing, retargeting, or seam carving,
refers to a image resizing technique where one can insert or remove &lt;em&gt;seams&lt;/em&gt;, or &amp;quot;paths of least importance&amp;quot;,
in order to shrink or grow the image.
I was introduced to the concept by &lt;a href=&quot;https://www.youtube.com/watch?v=qadw0BRKeMk&quot;&gt;a YouTube video&lt;/a&gt;
by Shai Avidan and Ariel Shamir.&lt;/p&gt;
&lt;p&gt;In this blog post, I&apos;ll go through a simple proof-of-concept implementation of content aware image resizing,
naturally in Rust :)&lt;/p&gt;
&lt;p&gt;For our sample image, I simply searched&lt;sup&gt;&lt;a href=&quot;#user-content-fn-duckduck&quot; id=&quot;user-content-fnref-duckduck&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; for &lt;code&gt;&amp;quot;sample image&amp;quot;&lt;/code&gt;, and got back this&lt;sup&gt;&lt;a href=&quot;#user-content-fn-image-source&quot; id=&quot;user-content-fnref-image-source&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;sample-image.jpeg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;h1&gt;Sketching out a top down approach&lt;/h1&gt;
&lt;p&gt;Let&apos;s start with some brainstorming.
I imagine the library to be used like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;/// caller.rs
let mut image = car::load_image(path);
// Resize to a known size?
image.resize_to(car::Dimensions::Absolute(800, 580));
// or remove 20 rows?
image.resize_to(car::Dimensions::Relative(0, -20));
// Maybe show the image in a window?
car::show_image(&amp;amp;image);
// or save to disk?
image.save(&amp;quot;resized.jpeg&amp;quot;);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The most important functions in &lt;code&gt;lib.rs&lt;/code&gt; could look something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;/// lib.rs
pub fn load_image(path: Path) -&amp;gt; Image {
    // We&apos;ll forget about error handling for now :)
    Image {
        inner: some_image_lib::load(path).unwrap(),
    }
}

impl Image {
    pub fn resize_to(&amp;amp;mut self, dimens: Dimensions) {
        // How many columns and rows do we need to insert/remove?
        let (mut xs, mut ys) = self.size_diffs(dimens);
        // When we want to add columns and rows, we would like
        // to always pick the path with the lowest score, no
        // matter if it&apos;s a row or a column.
        while xs != 0 &amp;amp;&amp;amp; ys != 0 {
            let best_horizontal = image.best_horizontal_path();
            let best_vertical = image.best_vertical_path();
            // Insert the best
            if best_horizontal.score &amp;lt; best_vertical.score {
                self.handle_path(best_horizontal, &amp;amp;mut xs);
            } else {
                self.handle_path(best_vertical, &amp;amp;mut ys);
            }
        }
        // Insert the rest in either direction.
        while xs != 0 {
            let path = image.best_horizontal_path();
            self.handle_path(path, &amp;amp;mut xs);
        }
        while ys != 0 {
            let path = image.best_vertical_path();
            self.handle_path(path, &amp;amp;mut ys);
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This gives us some idea on how to approach writing system. We need to load an image, we need to find these seams, or paths, and we need to handle removing such a path from the image.
In addition, we would perhaps like to be able to see our result.&lt;/p&gt;
&lt;p&gt;Let&apos;s do the image loading first, so we know what kind of API we&apos;re working with.&lt;/p&gt;
&lt;h2&gt;&lt;code&gt;image&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&quot;https://crates.io/crates/image&quot;&gt;&lt;code&gt;image&lt;/code&gt;&lt;/a&gt; library from the Piston developers seems useful,
so we&apos;ll add &lt;code&gt;image = &amp;quot;0.12&amp;quot;&lt;/code&gt; to our &lt;code&gt;Cargo.toml&lt;/code&gt;.
A quick search in the docs is all that it takes for us to write the image loading:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;struct Image {
    inner: image::DynamicImage,
}

impl Image {
    pub fn load_image(path: &amp;amp;Path) -&amp;gt; Image {
        Image {
            inner: image::open(path).unwrap()
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A natural next step is figuring out how to get the gradient magnitudes from a &lt;code&gt;image::DynamicImage&lt;/code&gt;.
The &lt;code&gt;image&lt;/code&gt; crate doesn&apos;t provide a way to do this directly,
but the &lt;a href=&quot;https://crates.io/crates/imageproc&quot;&gt;&lt;code&gt;imageproc&lt;/code&gt;&lt;/a&gt; crate does: &lt;code&gt;imageproc::gradients::sobel_gradients&lt;/code&gt;.
Here however, we run into trouble&lt;sup&gt;&lt;a href=&quot;#user-content-fn-gradient-trouble&quot; id=&quot;user-content-fnref-gradient-trouble&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.
The &lt;code&gt;sobel_gradient&lt;/code&gt; function takes an 8-bit grayscale image, and returns a 16-bit grayscale image.
The image we have loaded is an RGB image with 8-bits per channel, so we&apos;ll have to decompose the channels,
convert the three channels into separate grayscale images,
compute the gradients of the three component images, and then merge the gradients together into one image,
in which we will do the path searching.&lt;/p&gt;
&lt;p&gt;Is this elegant? No. Does it work? Maybe :)&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;type GradientBuffer = image::ImageBuffer&amp;lt;image::Luma&amp;lt;u16&amp;gt;, Vec&amp;lt;u16&amp;gt;&amp;gt;;

impl Image {
    pub fn load_image(path: &amp;amp;Path) -&amp;gt; Image {
        Image {
            inner: image::open(path).unwrap()
        }
    }

    fn gradient_magnitude(&amp;amp;self) -&amp;gt; GradientBuffer {
        // We&apos;ll assume RGB
        let (red, green, blue) = decompose(&amp;amp;self.inner);
        let r_grad = imageproc::gradients::sobel_gradients(red.as_luma8().unwrap());
        let g_grad = imageproc::gradients::sobel_gradients(green.as_luma8().unwrap());
        let b_grad = imageproc::gradients::sobel_gradients(blue.as_luma8().unwrap());

        let (w, h) = r_grad.dimensions();
        let mut container = Vec::with_capacity((w * h) as usize);
        for (r, g, b) in izip!(r_grad.pixels(), g_grad.pixels(), b_grad.pixels()) {
            container.push(r[0] + g[0] + b[0]);
        }
        image::ImageBuffer::from_raw(w, h, container).unwrap()
    }
}

fn decompose(image: &amp;amp;image::DynamicImage) -&amp;gt; (image::DynamicImage,
                                              image::DynamicImage,
                                              image::DynamicImage) {
    let w = image.width();
    let h = image.height();
    let mut red = image::DynamicImage::new_luma8(w, h);
    let mut green = image::DynamicImage::new_luma8(w, h);
    let mut blue = image::DynamicImage::new_luma8(w, h);
    for (x, y, pixel) in image.pixels() {
        let r = pixel[0];
        let g = pixel[1];
        let b = pixel[2];
        red.put_pixel(x, y, *image::Rgba::from_slice(&amp;amp;[r, r, r, 255]));
        green.put_pixel(x, y, *image::Rgba::from_slice(&amp;amp;[g, g, g, 255]));
        blue.put_pixel(x, y, *image::Rgba::from_slice(&amp;amp;[b, b, b, 255]));
    }
    (red, green, blue)
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When ran, &lt;code&gt;Image::gradient_magnitune&lt;/code&gt; takes our bird image, and returns this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;sample-image-gradient.jpeg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;The path of least resistance&lt;/h2&gt;
&lt;p&gt;Now we have to implement the arguably hardest part of the program: the DP algorithm to find the path of least resistance.
Let&apos;s take a quick look at how this will work out.
For simplicitys sake, we&apos;ll only look at the case where we find a vertical path.
Imagine the table below being the gradient image of a 6x6 image.&lt;/p&gt;
&lt;p&gt;$$
G = \begin{bmatrix}
1 &amp;amp; 4 &amp;amp; 3 &amp;amp; 4 &amp;amp; 2 &amp;amp; 1\\\
2 &amp;amp; 2 &amp;amp; 3 &amp;amp; 5 &amp;amp; 3 &amp;amp; 2\\\
1 &amp;amp; 4 &amp;amp; 5 &amp;amp; 5 &amp;amp; 1 &amp;amp; 2\\\
4 &amp;amp; 4 &amp;amp; 3 &amp;amp; 1 &amp;amp; 5 &amp;amp; 3\\\
5 &amp;amp; 3 &amp;amp; 2 &amp;amp; 2 &amp;amp; 3 &amp;amp; 1\\\
3 &amp;amp; 1 &amp;amp; 4 &amp;amp; 4 &amp;amp; 1 &amp;amp; 1
\end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;The point of the algorithm is to find a path $P=p_1 \dots\ p_6$ from one of the top cells $G_{1i}$ to one of the bottom cells $G_{6j}$, such that we minimize $\sum_{1 \leq i \leq 6} p_i$.
This can be done by creating a new table $S$ using the following recurrence relation (ignoring boundaries):&lt;/p&gt;
&lt;p&gt;$$ S_{ji} =
\begin{cases}
G_{6i} &amp;amp; \text{ if } i = 6\\
G_{ji} + \min(S_{j + 1, i - 1}, S_{j + 1, i}, S_{j + 1, i + 1}) &amp;amp; \text{ otherwise}
\end{cases}
$$&lt;/p&gt;
&lt;p&gt;That is,
each cell in $S$ is the minimum sum from that cell to a cell on the bottom.
Every cell selects the smallest of the three cells below it in the table to be the next cell in the path.
When we have completed $S$, we simply select the smallest number in the top row to be our start.&lt;/p&gt;
&lt;p&gt;Let&apos;s find $S$:&lt;/p&gt;
&lt;p&gt;$$
S^{(1)} = \begin{bmatrix}
- &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; -\\\
- &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; -\\\
- &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; -\\\
- &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; -\\\
- &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; -\\\
3 &amp;amp; 1 &amp;amp; 4 &amp;amp; 4 &amp;amp; 1 &amp;amp; 1
\end{bmatrix}
\hspace{1cm} S^{(2)} = \begin{bmatrix}
- &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; -\\\
- &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; -\\\
- &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; -\\\
- &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; -\\\
6 &amp;amp; 4 &amp;amp; 3 &amp;amp; 3 &amp;amp; 4 &amp;amp; 2\\\
3 &amp;amp; 1 &amp;amp; 4 &amp;amp; 4 &amp;amp; 1 &amp;amp; 1
\end{bmatrix}
$$
$$
S^{(3)} = \begin{bmatrix}
- &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; -\\\
- &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; -\\\
- &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; - &amp;amp; -\\\
8 &amp;amp; 7 &amp;amp; 6 &amp;amp; 4 &amp;amp; 7 &amp;amp; 5\\\
6 &amp;amp; 4 &amp;amp; 3 &amp;amp; 3 &amp;amp; 4 &amp;amp; 2\\\
3 &amp;amp; 1 &amp;amp; 4 &amp;amp; 4 &amp;amp; 1 &amp;amp; 1
\end{bmatrix}
\hspace{1cm} S^{(4)} = \begin{bmatrix}
- &amp;amp;  - &amp;amp;  - &amp;amp;  - &amp;amp;  - &amp;amp;  -\\\
- &amp;amp;  - &amp;amp;  - &amp;amp;  - &amp;amp;  - &amp;amp;  -\\\
8 &amp;amp; 10 &amp;amp;  9 &amp;amp;  9 &amp;amp;  5 &amp;amp;  7\\\
8 &amp;amp;  7 &amp;amp;  6 &amp;amp;  4 &amp;amp;  7 &amp;amp;  5\\\
6 &amp;amp;  4 &amp;amp;  3 &amp;amp;  3 &amp;amp;  4 &amp;amp;  2\\\
3 &amp;amp;  1 &amp;amp;  4 &amp;amp;  4 &amp;amp;  1 &amp;amp;  1
\end{bmatrix}
$$
$$
S^{(5)} = \begin{bmatrix}
\ - &amp;amp;  - &amp;amp;  - &amp;amp;  - &amp;amp;  - &amp;amp;  -\\\
10 &amp;amp; 10 &amp;amp; 12 &amp;amp; 10 &amp;amp;  8 &amp;amp;  7\\\
8 &amp;amp; 10 &amp;amp;  9 &amp;amp;  9 &amp;amp;  5 &amp;amp;  7\\\
8 &amp;amp;  7 &amp;amp;  6 &amp;amp;  4 &amp;amp;  7 &amp;amp;  5\\\
6 &amp;amp;  4 &amp;amp;  3 &amp;amp;  3 &amp;amp;  4 &amp;amp;  2\\\
3 &amp;amp;  1 &amp;amp;  4 &amp;amp;  4 &amp;amp;  1 &amp;amp;  1
\end{bmatrix}
\hspace{1cm} S^{(6)} = \begin{bmatrix}
11 &amp;amp; 14 &amp;amp; 13 &amp;amp; 13 &amp;amp; 10 &amp;amp;  \textbf{8}\\\
10 &amp;amp; 10 &amp;amp; 12 &amp;amp; 10 &amp;amp;  8 &amp;amp;  \textbf{7}\\\
8 &amp;amp; 10 &amp;amp;  9 &amp;amp;  9 &amp;amp;  \textbf{5} &amp;amp;  7\\\
8 &amp;amp;  7 &amp;amp;  6 &amp;amp;  \textbf{4} &amp;amp;  7 &amp;amp;  5\\\
6 &amp;amp;  4 &amp;amp;  3 &amp;amp;  \textbf{3} &amp;amp;  4 &amp;amp;  2\\\
3 &amp;amp;  1 &amp;amp;  4 &amp;amp;  4 &amp;amp;  \textbf{1} &amp;amp;  1
\end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;And there it is! We can see that there is a path which sums to only 8, and that the path starts in the upper right corner.
In order to find the path, we could have saved which way we went for each cell (left, down, or right), but we don&apos;t have to:
we can simply choose the minimum child of each cell, because the cells in $S$ says how long the shortest path from that cell to a bottom cell is.&lt;/p&gt;
&lt;p&gt;Also note that there are &lt;em&gt;two&lt;/em&gt; paths that sum to 8 (the two bottom cells differ in the two paths).&lt;/p&gt;
&lt;h3&gt;Implementation&lt;/h3&gt;
&lt;p&gt;Since we are just prototyping we will do the simplest thing. We&apos;ll make a struct with an array for the table,
and just &lt;code&gt;for&lt;/code&gt; loop our way through the algorithm.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;struct DPTable {
    width: usize,
    height: usize,
    table: Vec&amp;lt;u16&amp;gt;,
}

impl DPTable {
    fn from_gradient_buffer(gradient: &amp;amp;GradientBuffer) -&amp;gt; Self {
        let dims = gradient.dimensions();
        let w = dims.0 as usize;
        let h = dims.1 as usize;
        let mut table = DPTable {
            width: w,
            height: h,
            table: vec![0; w * h],
        };
        // return gradient[h][w], save us some typing
        let get = |w, h| gradient.get_pixel(w as u32, h as u32)[0];

        // Initialize bottom row
        for i in 0..w {
            let px = get(i, h - 1);
            table.set(i, h - 1, px)
        }
        // For each cell in row j, select the smaller of the cells in the
        // row above. Special case the end rows
        for row in (0..h - 1).rev() {
            for col in 1..w - 1 {
                let l = table.get(col - 1, row + 1);
                let m = table.get(col    , row + 1);
                let r = table.get(col + 1, row + 1);
                table.set(col, row, get(col, row) + min(min(l, m), r));
            }
            // special case far left and far right:
            let left = get(0, row) + min(table.get(0, row + 1), table.get(1, row + 1));
            table.set(0, row, left);
            let right = get(0, row) + min(table.get(w - 1, row + 1), table.get(w - 2, row + 1));
            table.set(w - 1, row, right);
        }
        table
    }
}

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After running, we can convert the &lt;code&gt;DPTable&lt;/code&gt; back to a &lt;code&gt;GradientBuffer&lt;/code&gt;, and write it to a file.
The pixels in the image below are the path weights divided by 128.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;sample-image-paths.jpeg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The image can be interpreted as follows: white pixels are cells that have a large sum from it to the bottom.
These pixels has so much detail (change of color) around it (which we would like to preserve)
so the gradient, which tells something about the rate of change, is large.
Since the path finding algorithm will search for the smallest sum, which here is the &amp;quot;darkest path&amp;quot;,
the algorithm will try its best to avoid these pixels.
That is, the white parts in the gradient image are the most distinct parts.&lt;/p&gt;
&lt;h2&gt;Finding the path&lt;/h2&gt;
&lt;p&gt;Now that we have the entire table, finding the best path is easy:
it&apos;s just a matter of searching through the uppper row
and creating a &lt;code&gt;vec&lt;/code&gt; of indices, by always choosing the smallest child:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;impl DPTable {
    fn path_start_index(&amp;amp;self) -&amp;gt; usize {
        // Has FP Gone Too Far?!
        self.table.iter()
            .take(self.width)
            .enumerate()
            .map(|(i, n)| (n, i))
            .min()
            .map(|(_, i)| i)
            .unwrap()
    }
}

struct Path {
    indices: Vec&amp;lt;usize&amp;gt;,
}

impl Path {
    pub fn from_dp_table(table: &amp;amp;DPTable) -&amp;gt; Self {
        let mut v = Vec::with_capacity(table.height);
        let mut col: usize = table.path_start_index();
        v.push(col);
        for row in 1..table.height {
            // Leftmost, no child to the left
            if col == 0 {
                let m = table.get(col, row);
                let r = table.get(col + 1, row);
                if m &amp;gt; r {
                    col += 1;
                }
            // Rightmost, no child to the right
            } else if col == table.width - 1 {
                let l = table.get(col - 1, row);
                let m = table.get(col, row);
                if l &amp;lt; m {
                    col -= 1;
                }
            } else {
                let l = table.get(col - 1, row);
                let m = table.get(col, row);
                let r = table.get(col + 1, row);
                let minimum = min(min(l, m), r);
                if minimum == l {
                    col -= 1;
                } else if minimum == r {
                    col += 1;
                }
            }
            v.push(col + row * table.width);
        }

        Path {
            indices: v
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In order to see if the paths selected are at least plausible, I generated 10 paths, and colored them yellow:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;sample-image-yellow-path.jpeg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Looks plausible to me!&lt;/p&gt;
&lt;h2&gt;Removal&lt;/h2&gt;
&lt;p&gt;The only thing remaining now is to remove the path instead of coloring it yellow.
Since we simply want to get something to work, we could do this in a pretty simple way:
get the raw bytes from the image, and copy the intervals between in indexes we want to remove
over in a new array, which we create a new image from.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;impl Image {
    fn remove_path(&amp;amp;mut self, path: Path) {
        let image_buffer = self.inner.to_rgb();
        let (w, h) = image_buffer.dimensions();
        let container = image_buffer.into_raw();
        let mut new_pixels = vec![];

        let mut path = path.indices.iter();
        let mut i = 0;
        while let Some(&amp;amp;index) = path.next() {
            new_pixels.extend(&amp;amp;container[i..index * 3]);
            i = (index + 1) * 3;
        }
        new_pixels.extend(&amp;amp;container[i..]);
        let ib = image::ImageBuffer::from_raw(w - 1, h, new_pixels).unwrap();
        self.inner = image::DynamicImage::ImageRgb8(ib);
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finaly, the time has come. Now we can remove a line from an image, or we could loop, and remove, say, 200 lines:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;let mut image = Image::load_image(path::Path::new(&amp;quot;sample-image.jpg&amp;quot;));
for _ in 0..200 {
    let grad = image.gradient_magnitude();
    let table = DPTable::from_gradient_buffer(&amp;amp;grad);
    let path = Path::from_dp_table(&amp;amp;table);
    image.remove_path(path);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;sample-image-cropped.jpeg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;However, we can see that the algorithm has removed quite a lot of the right side of the image,
that is, the image is more or less cropped,
which was exactly one of the problems that we would like to solve!
A quick and somewhat dirty fix to this is to simply alter the gradient a little, by explicitly setting the borders to some large number, say 100.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;sample-image-200.jpeg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Tada!&lt;/p&gt;
&lt;p&gt;There are quite a few artifacts here, which makes the end result a little less satisfactory.
The bird however is almost untouched, and still looks great (to me).
You could also argue that we have destroyed all sense of image composition in the process of making this image only slightly smaller. To this I will say .... uum.... yes.&lt;/p&gt;
&lt;h2&gt;Seeing is believing&lt;/h2&gt;
&lt;p&gt;Saving the images to a file and looking at it is kind of cool, but it isn&apos;t resize-window-live-update cool!
As a final effort, let&apos;s try to hack something together.&lt;/p&gt;
&lt;p&gt;First, we need to be able to load, get, and resize an image outside of the crate.
We&apos;ll try to make something like our initial plan:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;extern crate content_aware_resize;
use content_aware_resize as car;

fn main() {
    let mut image = car::load_image(path);
    image.resize_to(car::Dimensions::Relative(-1, 0));
    let data: &amp;amp;[u8] = image.get_image_data();
    // Somehow show this data in a window
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We start simple, by only adding exactly what we need, and taking shortcuts where we can.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;pub enum Dimensions {
    Relative(isize, isize),
}
...
impl Image {
    fn size_difference(&amp;amp;self, dims: Dimensions) -&amp;gt; (isize, isize) {
        let (w, h) = self.inner.dimensions();
        match dims {
            Dimensions::Relative(x, y) =&amp;gt; {
                (w as isize + x, h as isize + x)
            }
        }
    }

    pub fn resize_to(&amp;amp;mut self, dimensions: Dimensions) {
        let (mut xs, mut _ys) = self.size_difference(dimensions);
        // Only horizontal downsize for now
        if xs &amp;lt; 0 { panic!(&amp;quot;Only downsizing is supported.&amp;quot;) }
        if _ys != 0 { panic!(&amp;quot;Only horizontal resizing is supported.&amp;quot;) }
        while xs &amp;gt; 0 {
            let grad = self.gradient_magnitude();
            let table = DPTable::from_gradient_buffer(&amp;amp;grad);
            let path = Path::from_dp_table(&amp;amp;table);
            self.remove_path(path);
            xs -= 1;
        }
    }

    pub fn get_image_data(&amp;amp;self) -&amp;gt; &amp;amp;[u8] {
        self.inner.as_rgb8().unwrap()
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Just a little copy-paste!&lt;/p&gt;
&lt;p&gt;Now, maybe we want the resizable window.
We can start a new project, include the library crate, and use, say, &lt;code&gt;sdl2&lt;/code&gt; to get something up fast.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;extern crate content_aware_resize;
extern crate sdl2;
use content_aware_resize as car;
use sdl2::rect::Rect;
use sdl2::event::{Event, WindowEvent};
use sdl2::keyboard::Keycode;
use std::path::Path;

fn main() {
    // Load image
    let mut image = car::Image::load_image(Path::new(&amp;quot;sample-image.jpeg&amp;quot;));
    let (mut w, h) = image.dimmensions();

    // Setup sdl2 stuff, and get a window
    let sdl_ctx = sdl2::init().unwrap();
    let video = sdl_ctx.video().unwrap();
    let window = video.window(&amp;quot;Context Aware Resize&amp;quot;, w, h)
        .position_centered()
        .opengl()
        .resizable()
        .build()
        .unwrap();

    let mut renderer = window.renderer().build().unwrap();

    // Convenience function to update `texture` with a resized image
    let update_texture = |renderer: &amp;amp;mut sdl2::render::Renderer, image: &amp;amp;car::Image| {
        let (w, h) = image.dimmensions();
        let pixel_format = sdl2::pixels::PixelFormatEnum::RGB24;
        let mut tex = renderer.create_texture_static(pixel_format, w, h).unwrap();
        let data = image.get_image_data();
        let pitch = w * 3;
        tex.update(None, data, pitch as usize).unwrap();
        tex
    };
    let mut texture = update_texture(&amp;amp;mut renderer, &amp;amp;image);

    let mut event_pump = sdl_ctx.event_pump().unwrap();
    &apos;running: loop {
        for event in event_pump.poll_iter() {
            // Handle exit and resize events
            match event {
                Event::Quit {..}
                | Event::KeyDown { keycode: Some(Keycode::Escape), .. } =&amp;gt; { break &apos;running },
                Event::Window {win_event: WindowEvent::Resized(new_w, _h), .. } =&amp;gt; {
                    // Find out how many pixels we sized down, and scale down
                    // the image accordingly
                    let x_diff = new_w as isize - w as isize;
                    if x_diff &amp;lt; 0 {
                        image.resize_to(car::Dimensions::Relative(x_diff, 0));
                    }
                    w = new_w as u32;
                    texture = update_texture(&amp;amp;mut renderer, &amp;amp;image);
                },
                _ =&amp;gt; {}
            }
        }
        // Clear, copy, and present.
        renderer.clear();
        renderer.copy(&amp;amp;texture, None, Some(Rect::new(0, 0, w, h))).unwrap();
        renderer.present();
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And that&apos;s it. A days work, wih only very little knowledge of &lt;code&gt;sdl2&lt;/code&gt;, &lt;code&gt;image&lt;/code&gt;, and blog post writing.
I hope you enjoyed it, if only just a little bit :)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.github.com/martinhath/content-aware-resize&quot;&gt;Git repository&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.reddit.com/r/rust/comments/5ttzb4/implementing_content_aware_image_resizing/&quot;&gt;/r/Rust thread&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.reddit.com/r/programming/comments/5ttz9g/implementing_content_aware_image_resizing/&quot;&gt;/r/Programming thread&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://news.ycombinator.com/item?id=13636706&quot;&gt;HackerNews&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-duckduck&quot;&gt;
&lt;p&gt;Somehow, duckduckgoed doesn&apos;t work as well as googled when used as a verb. &lt;a href=&quot;#user-content-fnref-duckduck&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-image-source&quot;&gt;
&lt;p&gt;http://imgsv.imaging.nikon.com/lineup/lens/zoom/normalzoom/af-s_dx_18-140mmf_35-56g_ed_vr/img/sample/sample1_l.jpg &lt;a href=&quot;#user-content-fnref-image-source&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-gradient-trouble&quot;&gt;
&lt;p&gt;I&apos;d like to know if there is an easier way to do this! In addition, saving the resulting gradient is seemingly not possible at the moment, as the function returns an &lt;code&gt;ImageBuffer&lt;/code&gt; over &lt;code&gt;u16&lt;/code&gt;, while &lt;code&gt;ImageBuffer::save&lt;/code&gt; requires the underlying data to be &lt;code&gt;u8&lt;/code&gt;. I also couldn&apos;t figure out how to create a &lt;code&gt;DynamicImage&lt;/code&gt; (which also has a &lt;code&gt;::save&lt;/code&gt;, with a slightly cleaner interface) from an &lt;code&gt;ImageBuffer&lt;/code&gt;, but this might be possible. &lt;a href=&quot;#user-content-fnref-gradient-trouble&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Produce More, Consume Less</title><id>https://mht.wtf/post/produce/</id><updated>2024-12-17T21:22:16+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/produce/" rel=""/><link href="https://mht.wtf/post/produce/index.html" rel="alternate"/><published>2024-12-17T21:22:16+01:00</published><content type="text/html">&lt;p&gt;Lately, I&apos;ve found myself &lt;em&gt;consuming&lt;/em&gt; a lot, and barely &lt;em&gt;producing&lt;/em&gt; at all. I&apos;ve read, but not written. I&apos;ve listened to music, but barely played myself. I&apos;ve watched chess analyses instead of playing chess.&lt;/p&gt;
&lt;p&gt;I guess I &lt;em&gt;have&lt;/em&gt; programmed; that&apos;s something.&lt;/p&gt;
&lt;p&gt;This thought struck me when I had to write something at work. Just a few pages
of coherent text outlining some ideas I had. Sitting down with an empty page,
trying to squeeze out sentences while constructing an argument for an idea which
I couldn&apos;t quite articulate, was &lt;em&gt;really hard&lt;/em&gt;! I know that writing &lt;em&gt;is&lt;/em&gt; hard,
but I mostly felt out of shape. Even writing this post is hard!&lt;/p&gt;
&lt;p&gt;Yet, after I&apos;d made some progress on it, it felt good. I was happy with having
produced a tangible, albeit small, thing. This is a feeling I generally do not
get when consuming things. Not when listening to a great album, when watching a
classic movie, or when reading any piece of text. Now that I have written it
down it feels obvious that this is true for me, but it&apos;s taken me a long time to
come to this realization. I&apos;d like to produce more.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;I&lt;/em&gt; wrote this.  Thanks for reading.&lt;/p&gt;
</content></entry><entry><title>Fixing My Wacom Tablet</title><id>https://mht.wtf/post/wacom/</id><updated>2020-06-21T01:00:39+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/wacom/" rel=""/><link href="https://mht.wtf/post/wacom/index.html" rel="alternate"/><published>2020-06-21T01:00:39+02:00</published><content type="text/html">&lt;p&gt;A quick warning: this isn&apos;t one of these &amp;quot;here&apos;s the problem, here&apos;s the solution, bam bam bam&amp;quot; type of write-ups.
This was written while I tried to fix my tablet, and with very little editing after the fact.
Don&apos;t go in expecting a well though out story arc, as this is meant to reflect how I was working and what I was thinking.
I think generally there&apos;s way too little material online on how people work day to day, and
too many write-ups of just the good parts, so this is me helping pushing the ratio a little in the right direction.&lt;/p&gt;
&lt;p&gt;With that out of the way, let&apos;s start with the background.&lt;/p&gt;
&lt;p&gt;Over a year ago I bought a Wacom drawing tablet, having been increasingly annoyed
with my handwritten notes and doodles. I figured if I got a tablet I could draw digitally
which would simplify erasing, colors, and layout. And, of course, I would be able to
access it digitally. Since then I&apos;ve mainly used XournalPP for this, and I think it&apos;s been working okay.&lt;/p&gt;
&lt;p&gt;Not great though; there are definitely quirks with both XournalPP and the wacom driver,
and it took some tinkering before I had a setup that was usable.
Still, one thing that never worked is button presses on the drawing pad.
The buttons on the stylus works fine, and the on/off button on the pad works, but
none of the four remaining pad buttons do anything.&lt;/p&gt;
&lt;p&gt;Worse yet, I&apos;m using Wayland on my home computer, which I suspect will make things tougher.&lt;/p&gt;
&lt;p&gt;Still, I figured after so long I&apos;d try to properly fix this, whatever it takes.
Just&lt;sup&gt;&lt;a href=&quot;#user-content-fn-xournalppbind&quot; id=&quot;user-content-fnref-xournalppbind&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; getting some button event presses shouldn&apos;t be that hard, right?&lt;/p&gt;
&lt;h2&gt;libinput&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://www.freedesktop.org/wiki/Software/libinput/&quot;&gt;libinput&lt;/a&gt; is a library to handle input devices in Wayland.
My system also has the &lt;code&gt;libinput&lt;/code&gt; tool for interfacing with this library, and the tool
has, among other things, the command &lt;code&gt;debug-events&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;$ sudo libinput debug-events
-event1   DEVICE_ADDED     Power Button                      seat0 default group1  cap:k
-event0   DEVICE_ADDED     Power Button                      seat0 default group2  cap:k
-event20  DEVICE_ADDED     Logitech Performance MX           seat0 default group3  cap:p left scroll-nat scroll-button
-event3   DEVICE_ADDED     HDA ATI HDMI HDMI/DP,pcm=3        seat0 default group4  cap:
-event4   DEVICE_ADDED     HDA ATI HDMI HDMI/DP,pcm=7        seat0 default group4  cap:
-event5   DEVICE_ADDED     HDA ATI HDMI HDMI/DP,pcm=8        seat0 default group4  cap:
-event6   DEVICE_ADDED     HDA ATI HDMI HDMI/DP,pcm=9        seat0 default group4  cap:
-event7   DEVICE_ADDED     HDA ATI HDMI HDMI/DP,pcm=10       seat0 default group4  cap:
-event8   DEVICE_ADDED     HDA ATI HDMI HDMI/DP,pcm=11       seat0 default group4  cap:
-event21  DEVICE_ADDED     Kingsis Peripherals Evoluent VerticalMouse 4 seat0 default group5  cap:p left scroll-nat scroll-button
-event26  DEVICE_ADDED     HD Pro Webcam C920                seat0 default group6  cap:k
-event24  DEVICE_ADDED     Wacom Intuos BT M Pen             seat0 default group7  cap:T  size 216x135mm
-event25  DEVICE_ADDED     Wacom Intuos BT M Pad             seat0 default group7  cap:P buttons:4 strips:0 rings:0 mode groups:1
-event17  DEVICE_ADDED     ZSA Ergodox EZ                    seat0 default group8  cap:k
-event18  DEVICE_ADDED     ZSA Ergodox EZ Mouse              seat0 default group8  cap:p left scroll-nat scroll-button
-event19  DEVICE_ADDED     ZSA Ergodox EZ System Control     seat0 default group8  cap:k
-event22  DEVICE_ADDED     ZSA Ergodox EZ Consumer Control   seat0 default group8  cap:kp scroll-nat
-event23  DEVICE_ADDED     ZSA Ergodox EZ Keyboard           seat0 default group8  cap:k
-event10  DEVICE_ADDED     HD-Audio Generic Rear Mic         seat0 default group4  cap:
-event11  DEVICE_ADDED     HD-Audio Generic Line             seat0 default group4  cap:
-event12  DEVICE_ADDED     HD-Audio Generic Line Out Front   seat0 default group4  cap:
-event13  DEVICE_ADDED     HD-Audio Generic Line Out Surround seat0 default group4  cap:
-event14  DEVICE_ADDED     HD-Audio Generic Line Out CLFE    seat0 default group4  cap:
-event15  DEVICE_ADDED     HD-Audio Generic Line Out Side    seat0 default group4  cap:
-event16  DEVICE_ADDED     HD-Audio Generic Front Headphone  seat0 default group4  cap:
-event9   DEVICE_ADDED     HD-Audio Generic Front Mic        seat0 default group4  cap:
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;two of which looks pretty interesting:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;-event24  DEVICE_ADDED     Wacom Intuos BT M Pen             seat0 default group7  cap:T  size 216x135mm
-event25  DEVICE_ADDED     Wacom Intuos BT M Pad             seat0 default group7  cap:P buttons:4 strips:0 rings:0 mode groups:1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The pad with its four buttons seems to be properly detected.
Drawing on the pad while &lt;code&gt;libinput debug-events&lt;/code&gt; is running spits out a bunch of lines of the form&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; event24  TABLET_TOOL_AXIS +1.666s		121.71*/69.45*	distance: 0.94*
 event24  TABLET_TOOL_AXIS +1.674s		121.76*/69.46*	distance: 0.94
 event24  TABLET_TOOL_AXIS +1.682s		121.83*/69.47*	distance: 0.87*
 event24  TABLET_TOOL_AXIS +2.552s		106.09*/43.80*	pressure: 0.34*
 event24  TABLET_TOOL_AXIS +2.558s		106.06*/43.79*	pressure: 0.35*
 event24  TABLET_TOOL_AXIS +2.566s		106.05*/43.79	pressure: 0.35*
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;that is, events from the pen. We can see that we&apos;re getting events that the pen is near, but not touching, the pad with the lines saying &lt;code&gt;distance&lt;/code&gt;,
and that the events corresponding to when we&apos;re actually touching&lt;sup&gt;&lt;a href=&quot;#user-content-fn-touchingpad&quot; id=&quot;user-content-fnref-touchingpad&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; the pad have &lt;code&gt;pressure&lt;/code&gt;.
The numbers in the middle are positional coordinates, ranging from &lt;code&gt;0/0&lt;/code&gt; at the top left
to &lt;code&gt;216/135&lt;/code&gt; at the bottom right, which I suspect are in millimeters&lt;sup&gt;&lt;a href=&quot;#user-content-fn-mm&quot; id=&quot;user-content-fnref-mm&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;So far this seems to be working rather well.
But uh oh, what happens when we try to press the pad buttons?
Nothing.
And thus begins the adventure.&lt;/p&gt;
&lt;h2&gt;libwacom&lt;/h2&gt;
&lt;p&gt;I figure that since the pen is working well, it might be the driver for the pad that&apos;s lacking.
This is slightly supported by the fact that the only apparent usage of the pad
is to detect when the pen is &lt;em&gt;near&lt;/em&gt; (see this&lt;sup&gt;&lt;a href=&quot;#user-content-fn-touchingpad&quot; id=&quot;user-content-fnref-touchingpad-2&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; footnote).
Assuming this is handled across other pads in the same way, maybe the specifics of my pad,
like the buttons, are problematic in the driver.
The output from &lt;code&gt;libinput&lt;/code&gt; &lt;em&gt;did&lt;/em&gt; state correctly that it has 4 button though, but ... uuuh, let&apos;s just check anyways.&lt;/p&gt;
&lt;p&gt;Let&apos;s see what relevant kernel modules and packages we have:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ lsmod | grep wacom
wacom                 126976  0
usbhid                 65536  2 wacom,hid_logitech_dj
hid                   143360  5 wacom,usbhid,hid_generic,hid_logitech_dj,hid_logitech_hidpp
$ pacman -Q | grep wacom
libwacom 1.3-1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Okay. A quick search reveals that &lt;code&gt;libwacom&lt;/code&gt; is &lt;a href=&quot;https://github.com/linuxwacom/libwacom&quot;&gt;on Github&lt;/a&gt;,
and the README contains the following helpful note:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Use the &lt;code&gt;libwacom-list-local-devices&lt;/code&gt; tool to list all local devices recognized by libwacom. If your device is not listed, but it is available as an event device in the kernel (see /proc/bus/input/devices) and in the X session (see xinput list), the device is missing from libwacom&apos;s database.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Again, we are using Wayland, and since the README assumes an X system we might run into trouble.
Let&apos;s give it a shot:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ libwacom-list-local-devices
# Device node: /dev/input/event25
[Device]
Name=Wacom Intuos BT M
ModelName=CTL-6100WL
DeviceMatch=usb:056a:0378;bluetooth:056a:0379;
Class=Bamboo
Width=9
Height=5
IntegratedIn=
Layout=intuos-m-p3.svg
Styli=0x862;

[Features]
Reversible=false
Stylus=true
Ring=false
Ring2=false
Touch=false
TouchSwitch=false
# StatusLEDs=
NumStrips=0
Buttons=4

[Buttons]
# Left=
# Right=
Top=A;B;C;D;
# Bottom=
# Touchstrip=
# Touchstrip2=
# OLEDs=
# Ring=
# Ring2=
EvdevCodes=0x110;0x111;0x115;0x116;
RingNumModes=0
Ring2NumModes=0
StripsNumModes=0

---------------------------------------------------------------
# Device node: /dev/input/event24
[Device]
Name=Wacom Intuos BT M
ModelName=CTL-6100WL
DeviceMatch=usb:056a:0378;bluetooth:056a:0379;
Class=Bamboo
Width=9
Height=5
IntegratedIn=
Layout=intuos-m-p3.svg
Styli=0x862;

[Features]
Reversible=false
Stylus=true
Ring=false
Ring2=false
Touch=false
TouchSwitch=false
# StatusLEDs=
NumStrips=0
Buttons=4

[Buttons]
# Left=
# Right=
Top=A;B;C;D;
# Bottom=
# Touchstrip=
# Touchstrip2=
# OLEDs=
# Ring=
# Ring2=
EvdevCodes=0x110;0x111;0x115;0x116;
RingNumModes=0
Ring2NumModes=0
StripsNumModes=0

---------------------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Recall from above that &lt;code&gt;event24&lt;/code&gt; is the pen and &lt;code&gt;event25&lt;/code&gt; is the pad.
The driver seems to be confused as to the difference between the pad and the pen, as both devices have the exact same output;
Maybe this is due to to the fact that they share a device id, or maybe it makes things simpler in the driver.
For instance, having the pen be &lt;code&gt;Width=9&lt;/code&gt; and &lt;code&gt;Height=5&lt;/code&gt; is obviously not true, but those
are the limits of the pen pressure events that you&apos;d get since you always would use the pen together
with the pad. I&apos;ll assume that&apos;s not a problem.&lt;/p&gt;
&lt;p&gt;The output also states, once again, that we do have four buttons, but now they also correctly state that the buttons are on the top of the pad.
They are labeled A-D.
Since there are a listing of four numbers in &lt;code&gt;EvdevCodes&lt;/code&gt;, I think those are the &amp;quot;scan codes&amp;quot;, so to speak,
that are sent when the buttons are pressed.
In case of confusion, let&apos;s write those down in hex and decimal:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0x110 = 272
0x111 = 273
0x115 = 277
0x116 = 278
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Just cat it&lt;/h2&gt;
&lt;p&gt;Come to think of it, why don&apos;t we just &lt;code&gt;cat&lt;/code&gt; the right event file?
If there are any events coming through we would at least know that the hardware is recognizing that we&apos;re
using it and sending something into the driver.
Then we would have narrowed down slightly more where in the stack the problems are.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ sudo cat /dev/input/event25
�`��`�(�`��*D	�*D	(�*D	�y
                                  �y
                                    (�y
                                       ����(���)�)(�)^C⏎        
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That looks about right? Slightly unreadable tough;
here are the output after having pressed the buttons one at a time, piped through &lt;code&gt;hexdump&lt;/code&gt; with a newline
in between each event:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ sudo cat /dev/input/event25 | hexdump
0000000 04c1 5eee 0000 0000 ffb0 0005 0000 0000
0000010 0001 0100 0001 0000 04c1 5eee 0000 0000
0000020 ffb0 0005 0000 0000 0003 0028 000f 0000
0000030 04c1 5eee 0000 0000 ffb0 0005 0000 0000

0000040 0000 0000 0000 0000 04c1 5eee 0000 0000
0000050 39ff 0008 0000 0000 0001 0100 0000 0000
0000060 04c1 5eee 0000 0000 39ff 0008 0000 0000
0000070 0003 0028 0000 0000 04c1 5eee 0000 0000
0000080 39ff 0008 0000 0000 0000 0000 0000 0000

0000090 04c4 5eee 0000 0000 4288 0004 0000 0000
00000a0 0001 0101 0001 0000 04c4 5eee 0000 0000
00000b0 4288 0004 0000 0000 0003 0028 000f 0000
00000c0 04c4 5eee 0000 0000 4288 0004 0000 0000

00000d0 0000 0000 0000 0000 04c4 5eee 0000 0000
00000e0 d49a 0007 0000 0000 0001 0101 0000 0000
00000f0 04c4 5eee 0000 0000 d49a 0007 0000 0000
0000100 0003 0028 0000 0000 04c4 5eee 0000 0000
0000110 d49a 0007 0000 0000 0000 0000 0000 0000

0000120 04c4 5eee 0000 0000 d983 000e 0000 0000
0000130 0001 0102 0001 0000 04c4 5eee 0000 0000
0000140 d983 000e 0000 0000 0003 0028 000f 0000
0000150 04c4 5eee 0000 0000 d983 000e 0000 0000

0000160 0000 0000 0000 0000 04c5 5eee 0000 0000
0000170 0a14 0003 0000 0000 0001 0102 0000 0000
0000180 04c5 5eee 0000 0000 0a14 0003 0000 0000
0000190 0003 0028 0000 0000 04c5 5eee 0000 0000
00001a0 0a14 0003 0000 0000 0000 0000 0000 0000

00001b0 04c5 5eee 0000 0000 7e2c 000b 0000 0000
00001c0 0001 0103 0001 0000 04c5 5eee 0000 0000
00001d0 7e2c 000b 0000 0000 0003 0028 000f 0000
00001e0 04c5 5eee 0000 0000 7e2c 000b 0000 0000

00001f0 0000 0000 0000 0000 04c5 5eee 0000 0000
0000200 3f1e 000f 0000 0000 0001 0103 0000 0000
0000210 04c5 5eee 0000 0000 3f1e 000f 0000 0000
0000220 0003 0028 0000 0000 04c5 5eee 0000 0000
0000230 3f1e 000f 0000 0000 0000 0000 0000 0000
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Somehow, the the down presses are less data than the releases.
This might be true, but another explanation is that &lt;code&gt;hexdump&lt;/code&gt; is buffering up the data
so that it can output each line as &lt;code&gt;0x10&lt;/code&gt; bytes.
After carefully reading the &lt;code&gt;man&lt;/code&gt; page of &lt;code&gt;hexdump&lt;/code&gt;, and with some trial and error&lt;sup&gt;&lt;a href=&quot;#user-content-fn-manhexdump&quot; id=&quot;user-content-fnref-manhexdump&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;,
the following does the job:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;$ sudo cat /dev/input/event25 | hexdump -ve &amp;quot;1/1 \&amp;quot;%02x\n\&amp;quot;&amp;quot;
# A down                                              **    **
0a 0a ee 5e 00 00 00 00 9c 82 06 00 00 00 00 00 01 00 00 01 01 00 00 00 0a 0a ee 5e 00 00 00 00 9c 82 06 00 00 00 00 00 03 00 28 00 0f 00 00 00 0a 0a ee 5e 00 00 00 00 9c 82 06 00 00 00 00 00 00 00 00 00 00 00 00 00
# A up
0b 0a ee 5e 00 00 00 00 3b fa 03 00 00 00 00 00 01 00 00 01 00 00 00 00 0b 0a ee 5e 00 00 00 00 3b fa 03 00 00 00 00 00 03 00 28 00 00 00 00 00 0b 0a ee 5e 00 00 00 00 3b fa 03 00 00 00 00 00 00 00 00 00 00 00 00 00
# B down
0c 0a ee 5e 00 00 00 00 a9 ea 03 00 00 00 00 00 01 00 01 01 01 00 00 00 0c 0a ee 5e 00 00 00 00 a9 ea 03 00 00 00 00 00 03 00 28 00 0f 00 00 00 0c 0a ee 5e 00 00 00 00 a9 ea 03 00 00 00 00 00 00 00 00 00 00 00 00 00
# B up
0c 0a ee 5e 00 00 00 00 42 0e 0f 00 00 00 00 00 01 00 01 01 00 00 00 00 0c 0a ee 5e 00 00 00 00 42 0e 0f 00 00 00 00 00 03 00 28 00 00 00 00 00 0c 0a ee 5e 00 00 00 00 42 0e 0f 00 00 00 00 00 00 00 00 00 00 00 00 00
# C down
0e 0a ee 5e 00 00 00 00 f4 ee 01 00 00 00 00 00 01 00 02 01 01 00 00 00 0e 0a ee 5e 00 00 00 00 f4 ee 01 00 00 00 00 00 03 00 28 00 0f 00 00 00 0e 0a ee 5e 00 00 00 00 f4 ee 01 00 00 00 00 00 00 00 00 00 00 00 00 00
# C up
0e 0a ee 5e 00 00 00 00 4c 76 0c 00 00 00 00 00 01 00 02 01 00 00 00 00 0e 0a ee 5e 00 00 00 00 4c 76 0c 00 00 00 00 00 03 00 28 00 00 00 00 00 0e 0a ee 5e 00 00 00 00 4c 76 0c 00 00 00 00 00 00 00 00 00 00 00 00 00
# D down
0f 0a ee 5e 00 00 00 00 ee 8f 08 00 00 00 00 00 01 00 03 01 01 00 00 00 0f 0a ee 5e 00 00 00 00 ee 8f 08 00 00 00 00 00 03 00 28 00 0f 00 00 00 0f 0a ee 5e 00 00 00 00 ee 8f 08 00 00 00 00 00 00 00 00 00 00 00 00 00
# D up
10 0a ee 5e 00 00 00 00 c2 96 04 00 00 00 00 00 01 00 03 01 00 00 00 00 10 0a ee 5e 00 00 00 00 c2 96 04 00 00 00 00 00 03 00 28 00 00 00 00 00 10 0a ee 5e 00 00 00 00 c2 96 04 00 00 00 00 00 00 00 00 00 00 00 00 00
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can even see some signs of what data is sent through here.
For instance, in the columns as marked by &lt;code&gt;**&lt;/code&gt; we see &lt;code&gt;00&lt;/code&gt; through &lt;code&gt;03&lt;/code&gt;, likely the button number,
and &lt;code&gt;01&lt;/code&gt; for press and &lt;code&gt;00&lt;/code&gt; for release.
In other words, there seems to be reasonable data sent from the pad that we can read from &lt;code&gt;/dev/input/event25&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Next, we need to find out on which side &lt;code&gt;libwacom&lt;/code&gt; is;
is that doing the mapping from whatever goes to over the wire and to what we just read,
or is that supposed to read from &lt;code&gt;event25&lt;/code&gt; to the events that we did not get from &lt;code&gt;libinput&lt;/code&gt;?&lt;/p&gt;
&lt;h2&gt;A Closer Look At That Data&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://www.kernel.org/doc/html/v4.15/input/input.html&quot;&gt;kernel.org&lt;/a&gt; has some documentation for the Linux input subsystem,
but I think it&apos;s written in such a way that it&apos;s not &lt;em&gt;very&lt;/em&gt; helpful unless you already have a pretty good idea
of what&apos;s going on.
However, Section 1.5 has the following info:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You can use blocking and nonblocking reads, and also select() on the /dev/input/eventX devices, and you’ll always get a whole number of input events on a read. Their layout is:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;struct input_event {
    struct timeval time;
    unsigned short type;
    unsigned short code;
    unsigned int value;
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This doesn&apos;t seem right, since the size of &lt;code&gt;input_event&lt;/code&gt; is way less than the data we read above.
Unless, of course, we didn&apos;t get only one event. Since the first member is the time we can assume
that all events should start with more or less the same.
In addition, we suspect that an event is about &lt;code&gt;8 + 2 + 2 + 4 = 16&lt;/code&gt; bytes long.
Or maybe &lt;code&gt;struct timeval&lt;/code&gt; is &lt;code&gt;16&lt;/code&gt; bytes big? That seems to &lt;em&gt;align&lt;/em&gt; much better with the data we have read:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0a 0a ee 5e 00 00 00 00  9c 82 06 00 00 00 00 00  01 00  00 01  01 00 00 00
0a 0a ee 5e 00 00 00 00  9c 82 06 00 00 00 00 00  03 00  28 00  0f 00 00 00
0a 0a ee 5e 00 00 00 00  9c 82 06 00 00 00 00 00  00 00  00 00  00 00 00 00
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;man 3 timeval&lt;/code&gt; says that &lt;code&gt;struct timeval&lt;/code&gt; has two members, a &lt;code&gt;time_t&lt;/code&gt; and a &lt;code&gt;suseconds_t&lt;/code&gt;;
in addition, we&apos;re on a big-endian machine, so our &lt;code&gt;struct&lt;/code&gt; members should look like this&lt;sup&gt;&lt;a href=&quot;#user-content-fn-inputevent&quot; id=&quot;user-content-fnref-inputevent&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;struct input_event {
    struct timeval {
        time_t       tv_sec = 0x000000005eee0a0a;
        suseconds_t tv_usec = 0x000000000006829c;
    }                                  // and for the other two events:
    unsigned short type  =     0x0001; //     0003 / 0000
    unsigned short code  =     0x0100; //     0028 / 0000
    unsigned int   value = 0x00000001; // 0000000f / 00000000
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The time certainly looks reasonable:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-fish&quot;&gt;$ printf &amp;quot;%016x\n&amp;quot; (date +&amp;quot;%s&amp;quot;)
000000005eee1baa
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The docs says that the types are defined in &lt;code&gt;include/uapi/linux/input-event-codes.h&lt;/code&gt;, which I found on my system in &lt;code&gt;/usr/include/linux/input-event-codes.h&lt;/code&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-uapi&quot; id=&quot;user-content-fnref-uapi&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.
Looking through it we can infer that the events we&apos;re reading are;
The difficulty is that how to interpret the &lt;code&gt;code&lt;/code&gt; or &lt;code&gt;value&lt;/code&gt; of an event depends on the &lt;code&gt;type&lt;/code&gt; of the event.
&lt;a href=&quot;https://www.kernel.org/doc/html/v4.15/input/event-codes.html&quot;&gt;§2.2.1&lt;/a&gt; is helpful here.
As far as I can tell, this is what&apos;s going on:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;Type&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Code&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;Value&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;EV_KEY&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;BTN_0&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;pressed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;EV_ABS&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;ABS_MISC&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;15&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;EV_SYN&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;SYN_REPORT&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;0&lt;/td&gt;
&lt;td align=&quot;left&quot;&gt;undef&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Note that in &lt;a href=&quot;https://www.kernel.org/doc/html/v4.15/input/event-codes.html&quot;&gt;§2.2.1&lt;/a&gt; they say&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;EV_SYN event values are undefined. Their usage is defined only by when they are sent in the evdev event stream.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here are all of the events from earlier, but this time one per line and annotated on the right:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;# A down
0a 0a ee 5e 00 00 00 00  9c 82 06 00 00 00 00 00  01 00  00 01  01 00 00 00 # KEY/BTN_0 Press
0a 0a ee 5e 00 00 00 00  9c 82 06 00 00 00 00 00  03 00  28 00  0f 00 00 00 # ABS/MISC 15
0a 0a ee 5e 00 00 00 00  9c 82 06 00 00 00 00 00  00 00  00 00  00 00 00 00 # SYN/REPORT
# A up
0b 0a ee 5e 00 00 00 00  3b fa 03 00 00 00 00 00  01 00  00 01  00 00 00 00 # KEY/BTN_0 Release
0b 0a ee 5e 00 00 00 00  3b fa 03 00 00 00 00 00  03 00  28 00  00 00 00 00 # ABS/MISC 0
0b 0a ee 5e 00 00 00 00  3b fa 03 00 00 00 00 00  00 00  00 00  00 00 00 00 # SYN/REPORT
# B down
0c 0a ee 5e 00 00 00 00  a9 ea 03 00 00 00 00 00  01 00  01 01  01 00 00 00 # KEY/BTN_1 Press
0c 0a ee 5e 00 00 00 00  a9 ea 03 00 00 00 00 00  03 00  28 00  0f 00 00 00 # ABS/MISC 15
0c 0a ee 5e 00 00 00 00  a9 ea 03 00 00 00 00 00  00 00  00 00  00 00 00 00 # SYN/REPORT
# B up
0c 0a ee 5e 00 00 00 00  42 0e 0f 00 00 00 00 00  01 00  01 01  00 00 00 00 # KEY/BTN_1 Release
0c 0a ee 5e 00 00 00 00  42 0e 0f 00 00 00 00 00  03 00  28 00  00 00 00 00 # ABS/MISC 0
0c 0a ee 5e 00 00 00 00  42 0e 0f 00 00 00 00 00  00 00  00 00  00 00 00 00 # SYN/REPORT
# C down
0e 0a ee 5e 00 00 00 00  f4 ee 01 00 00 00 00 00  01 00  02 01  01 00 00 00 # KEY/BTN_2 Press
0e 0a ee 5e 00 00 00 00  f4 ee 01 00 00 00 00 00  03 00  28 00  0f 00 00 00 # ABS/MISC 15
0e 0a ee 5e 00 00 00 00  f4 ee 01 00 00 00 00 00  00 00  00 00  00 00 00 00 # SYN/REPORT
# C up
0e 0a ee 5e 00 00 00 00  4c 76 0c 00 00 00 00 00  01 00  02 01  00 00 00 00 # KEY/BTN_2 Release
0e 0a ee 5e 00 00 00 00  4c 76 0c 00 00 00 00 00  03 00  28 00  00 00 00 00 # ABS/MISC 0
0e 0a ee 5e 00 00 00 00  4c 76 0c 00 00 00 00 00  00 00  00 00  00 00 00 00 # SYN/REPORT
# D down
0f 0a ee 5e 00 00 00 00  ee 8f 08 00 00 00 00 00  01 00  03 01  01 00 00 00 # KEY/BTN_3 Press
0f 0a ee 5e 00 00 00 00  ee 8f 08 00 00 00 00 00  03 00  28 00  0f 00 00 00 # ABS/MISC 15
0f 0a ee 5e 00 00 00 00  ee 8f 08 00 00 00 00 00  00 00  00 00  00 00 00 00 # SYN/REPORT
# D up
10 0a ee 5e 00 00 00 00  c2 96 04 00 00 00 00 00  01 00  03 01  00 00 00 00 # KEY/BTN_3 Release
10 0a ee 5e 00 00 00 00  c2 96 04 00 00 00 00 00  03 00  28 00  00 00 00 00 # ABS/MISC 0
10 0a ee 5e 00 00 00 00  c2 96 04 00 00 00 00 00  00 00  00 00  00 00 00 00 # SYN/REPORT
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In trying to find out more info about &lt;code&gt;ABS_MISC&lt;/code&gt; I found a wiki page on the &lt;a href=&quot;https://github.com/linuxwacom/input-wacom&quot;&gt;linuxwacom/input-wacom&lt;/a&gt; repository named
&lt;a href=&quot;https://github.com/linuxwacom/input-wacom/wiki/Kernel-Input-Event-Overview&quot;&gt;Kernel Input Event Overview&lt;/a&gt;,
which explains pretty well how the wacom driver works.
They state the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In addition to the &lt;code&gt;BTN_TOOL_*&lt;/code&gt; events for informing user land what tool the current events being sent belong to, there is a &lt;code&gt;MSC_SERIAL&lt;/code&gt; event that contains a serial # to aid in tracking current tool as well as a &lt;code&gt;ABS_MISC&lt;/code&gt; which is a hard code device ID. Of these two, the &lt;code&gt;MSC_SERIAL&lt;/code&gt; is the most useful to user land.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So, uuh.. maybe I won&apos;t worry too much about the &lt;code&gt;ABS/MISC&lt;/code&gt; events.
But this is good; we have confirmed that the data we&apos;re reading from &lt;code&gt;/dev/input/event25&lt;/code&gt; is of the type &lt;code&gt;input_event&lt;/code&gt;,
and that it makes sense, more or less. From reading the wiki page it really does sound like the driver is mapping
whatever goes over the wire to the Linux input subsystem format, which is &lt;code&gt;input_event&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;At this point I realize that &lt;code&gt;linuxwacom&lt;/code&gt; has three primary components:
&lt;code&gt;input-wacom&lt;/code&gt; which is the kernel driver, which presumably does the mapping just mentioned;
&lt;code&gt;xf86-input-wacom&lt;/code&gt; the X driver, which I suppose makes kernel driver events into X events?
and &lt;code&gt;libwacom&lt;/code&gt;, which really just seems to be a utility for simpler querying of state and button mapping and so on.&lt;/p&gt;
&lt;p&gt;Going back, we can see that &lt;code&gt;libwacom-list-local-devices&lt;/code&gt; seems to work just fine,
which I think means that the pad is properly detected.
In addition, we know that we get &amp;quot;good&amp;quot; events to &lt;code&gt;/dev/input/event25&lt;/code&gt; so presumably the kernel driver also works fine.
However, &lt;code&gt;libinput debug-events&lt;/code&gt; did not list the button presses, so &lt;code&gt;libinput&lt;/code&gt; doesn&apos;t get those events, although it does get the stylus events.&lt;/p&gt;
&lt;h2&gt;Back to libinput&lt;/h2&gt;
&lt;p&gt;Next, we got back to &lt;code&gt;libinput&lt;/code&gt;; looking through some of the docs it seems there&apos;s another command,
&lt;code&gt;libinput record&lt;/code&gt;.
According to &lt;code&gt;man 1 libinput-record&lt;/code&gt;,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The libinput record tool records kernel events from a device and prints them in a format that can later be replayed with the libinput replay(1) tool.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Running it, and selecting our device, actually shows that the buttons are detected:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ sudo libinput record
Available devices:
/dev/input/event0:	Power Button
/dev/input/event1:	Power Button
/dev/input/event2:	PC Speaker
/dev/input/event3:	HDA ATI HDMI HDMI/DP,pcm=3
/dev/input/event4:	HDA ATI HDMI HDMI/DP,pcm=7
/dev/input/event5:	HDA ATI HDMI HDMI/DP,pcm=8
/dev/input/event6:	HDA ATI HDMI HDMI/DP,pcm=9
/dev/input/event7:	HDA ATI HDMI HDMI/DP,pcm=10
/dev/input/event8:	HDA ATI HDMI HDMI/DP,pcm=11
/dev/input/event9:	HD-Audio Generic Front Mic
/dev/input/event10:	HD-Audio Generic Rear Mic
/dev/input/event11:	HD-Audio Generic Line
/dev/input/event12:	HD-Audio Generic Line Out Front
/dev/input/event13:	HD-Audio Generic Line Out Surround
/dev/input/event14:	HD-Audio Generic Line Out CLFE
/dev/input/event15:	HD-Audio Generic Line Out Side
/dev/input/event16:	HD-Audio Generic Front Headphone
/dev/input/event17:	ZSA Ergodox EZ
/dev/input/event18:	ZSA Ergodox EZ Mouse
/dev/input/event19:	ZSA Ergodox EZ System Control
/dev/input/event20:	Logitech Performance MX
/dev/input/event21:	Kingsis Peripherals Evoluent VerticalMouse 4
/dev/input/event22:	ZSA Ergodox EZ Consumer Control
/dev/input/event23:	ZSA Ergodox EZ Keyboard
/dev/input/event24:	Wacom Intuos BT M Pen
/dev/input/event25:	Wacom Intuos BT M Pad
/dev/input/event26:	HD Pro Webcam C920
Select the device event number: 25
Recording to &apos;stdout&apos;.
version: 1
ndevices: 1
libinput:
  version: &amp;quot;1.15.5&amp;quot;
  git: &amp;quot;unknown&amp;quot;
system:
  kernel: &amp;quot;5.7.2-arch1-1&amp;quot;
  dmi: &amp;quot;dmi:bvnAmericanMegatrendsInc.:bvr3.D0:bd07/11/2018:svnMicro-StarInternationalCo.,Ltd.:pnMS-7A33:pvr2.0:rvnMSI:rnX370SLIPLUS(MS-7A33):rvr2.0:cvnMicro-StarInternationalCo.,Ltd.:ct3:cvr2.0:&amp;quot;
devices:
- node: /dev/input/event25
  evdev:
    # Name: Wacom Intuos BT M Pad
    # ID: bus 0x3 vendor 0x56a product 0x378 version 0x110
    # Size in mm: unknown, missing resolution
    # Supported Events:
    # Event type 0 (EV_SYN)
    # Event type 1 (EV_KEY)
    #   Event code 256 (BTN_0)
    #   Event code 257 (BTN_1)
    #   Event code 258 (BTN_2)
    #   Event code 259 (BTN_3)
    #   Event code 331 (BTN_STYLUS)
    # Event type 3 (EV_ABS)
    #   Event code 0 (ABS_X)
    #       Value           0
    #       Min             0
    #       Max             1
    #       Fuzz            0
    #       Flat            0
    #       Resolution      0
    #   Event code 1 (ABS_Y)
    #       Value           0
    #       Min             0
    #       Max             1
    #       Fuzz            0
    #       Flat            0
    #       Resolution      0
    #   Event code 40 (ABS_MISC)
    #       Value           0
    #       Min             0
    #       Max             0
    #       Fuzz            0
    #       Flat            0
    #       Resolution      0
    # Properties:
    name: &amp;quot;Wacom Intuos BT M Pad&amp;quot;
    id: [3, 1386, 888, 272]
    codes:
      0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] # EV_SYN
      1: [256, 257, 258, 259, 331] # EV_KEY
      3: [0, 1, 40] # EV_ABS
    absinfo:
      0: [0, 1, 0, 0, 0]
      1: [0, 1, 0, 0, 0]
      40: [0, 0, 0, 0, 0]
    properties: []
  hid: [6, 13, 255, 9, 1, 161, 1, 133, 16, 9, 32, 53, 0, 69, 0, 21, 0, 37, 1, 161, 0, 9, 66, 9, 68, 9, 90, 37, 1, 117, 1, 149, 3, 129, 2, 149, 2, 129, 3, 9, 50, 9, 54, 149, 2, 129, 2, 149, 1, 129, 3, 10, 48, 1, 101, 17, 85, 13, 71, 96, 84, 0, 0, 39, 96, 84, 0, 0, 117, 24, 149, 1, 129, 2, 10, 49, 1, 71, 188, 52, 0, 0, 39, 188, 52, 0, 0, 129, 2, 9, 48, 85, 0, 101, 0, 38, 255, 15, 117, 16, 129, 2, 117, 8, 149, 6, 129, 3, 10, 50, 1, 37, 63, 117, 8, 149, 1, 129, 2, 9, 91, 9, 92, 23, 0, 0, 0, 128, 39, 255, 255, 255, 127, 117, 32, 149, 2, 129, 2, 9, 119, 21, 0, 38, 255, 15, 117, 16, 149, 1, 129, 2, 192, 133, 17, 101, 0, 85, 0, 53, 0, 69, 0, 9, 57, 161, 0, 10, 16, 9, 10, 17, 9, 10, 18, 9, 10, 19, 9, 21, 0, 37, 1, 117, 1, 149, 4, 129, 2, 149, 4, 129, 3, 117, 8, 149, 7, 129, 3, 192, 133, 19, 101, 0, 85, 0, 53, 0, 69, 0, 10, 19, 16, 161, 0, 10, 59, 4, 21, 0, 37, 100, 117, 7, 149, 1, 129, 2, 10, 4, 4, 37, 1, 117, 1, 129, 2, 9, 0, 38, 255, 0, 117, 8, 129, 2, 117, 8, 149, 6, 129, 3, 192, 9, 14, 161, 2, 133, 2, 10, 2, 16, 21, 2, 37, 2, 117, 8, 149, 1, 177, 2, 133, 3, 10, 3, 16, 21, 0, 38, 255, 0, 149, 1, 177, 2, 133, 4, 10, 4, 16, 21, 1, 37, 1, 149, 1, 177, 2, 133, 7, 10, 9, 16, 21, 0, 38, 255, 0, 149, 1, 177, 2, 177, 3, 10, 7, 16, 9, 0, 39, 255, 255, 0, 0, 117, 16, 149, 2, 177, 2, 117, 8, 149, 9, 177, 3, 133, 12, 10, 48, 13, 10, 49, 13, 10, 50, 13, 10, 51, 13, 101, 17, 85, 13, 53, 0, 70, 200, 0, 21, 0, 38, 144, 1, 117, 16, 149, 4, 177, 2, 133, 13, 10, 13, 16, 101, 0, 85, 0, 69, 0, 37, 1, 117, 8, 149, 1, 177, 2, 133, 20, 10, 20, 16, 38, 255, 0, 149, 13, 177, 2, 133, 204, 10, 204, 16, 149, 2, 177, 2, 133, 49, 10, 49, 16, 37, 100, 149, 3, 177, 2, 149, 2, 177, 3, 192, 10, 172, 16, 161, 2, 21, 0, 38, 255, 0, 117, 8, 133, 172, 9, 0, 150, 191, 0, 129, 2, 133, 21, 9, 0, 149, 14, 177, 2, 133, 51, 9, 0, 149, 18, 177, 2, 133, 68, 9, 0, 149, 4, 177, 2, 133, 69, 9, 0, 149, 32, 177, 2, 133, 96, 9, 0, 149, 63, 177, 2, 133, 97, 9, 0, 149, 62, 177, 2, 133, 98, 9, 0, 149, 62, 177, 2, 133, 101, 9, 0, 149, 4, 177, 2, 133, 102, 9, 0, 149, 4, 177, 2, 133, 103, 9, 0, 149, 4, 177, 2, 133, 104, 9, 0, 149, 17, 177, 2, 133, 111, 9, 0, 149, 62, 177, 2, 133, 205, 9, 0, 149, 2, 177, 2, 133, 22, 9, 0, 149, 14, 177, 2, 133, 53, 9, 0, 149, 10, 177, 2, 192, 133, 208, 9, 1, 150, 8, 0, 177, 2, 133, 209, 9, 1, 150, 4, 1, 177, 2, 133, 210, 9, 1, 150, 4, 1, 177, 2, 133, 211, 9, 1, 150, 4, 0, 177, 2, 133, 212, 9, 1, 150, 4, 0, 177, 2, 133, 213, 9, 1, 150, 4, 0, 177, 2, 133, 214, 9, 1, 150, 4, 0, 177, 2, 133, 215, 9, 1, 150, 8, 0, 177, 2, 133, 216, 9, 1, 150, 12, 0, 177, 2, 133, 217, 9, 1, 150, 0, 5, 177, 2, 133, 218, 9, 1, 150, 4, 2, 177, 2, 133, 219, 9, 1, 150, 6, 0, 177, 2, 133, 220, 9, 1, 150, 2, 0, 177, 2, 133, 221, 9, 1, 150, 4, 0, 177, 2, 133, 222, 9, 1, 150, 4, 0, 177, 2, 133, 223, 9, 1, 150, 34, 0, 177, 2, 133, 224, 9, 1, 150, 1, 0, 177, 2, 133, 225, 9, 1, 150, 2, 0, 177, 2, 133, 226, 9, 1, 150, 2, 0, 177, 2, 133, 227, 9, 1, 150, 2, 0, 177, 2, 133, 228, 9, 1, 150, 255, 1, 177, 2, 192 ]
  udev:
    properties:
    - ID_INPUT=1
    - ID_INPUT_TABLET=1
    - ID_INPUT_TABLET_PAD=1
    - LIBINPUT_DEVICE_GROUP=3/56a/378:usb-0000:29:00.3-2
  quirks:
  events:
  - evdev:
    - [  0,      0,   1, 256,       1] # EV_KEY / BTN_0                     1
  - evdev:
    - [  0,      0,   3,  40,      15] # EV_ABS / ABS_MISC                 15 (+15)
    - [  0,      0,   0,   0,       0] # ------------ SYN_REPORT (0) ---------- +0ms
  - evdev:
    - [  0, 214000,   1, 256,       0] # EV_KEY / BTN_0                     0
    - [  0, 214000,   3,  40,       0] # EV_ABS / ABS_MISC                  0 (-15)
    - [  0, 214000,   0,   0,       0] # ------------ SYN_REPORT (0) ---------- +214ms
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These are exactly the same events as the ones we reverse engineered above, which is a good sign;
&lt;code&gt;libinput&lt;/code&gt; gets the same events as we are, but it seems to decide that they aren&apos;t worth sending further.
Maybe if we dig a bit into &lt;code&gt;libinput&lt;/code&gt; we can find out how devices and events are treated, set a breakpoint
somewhere when &lt;code&gt;libinput&lt;/code&gt; is reading the button press event and see what happens.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://wayland.freedesktop.org/libinput/doc/latest/architecture.html&quot;&gt;The libinput docs&lt;/a&gt;
has an overview over &lt;code&gt;libinput&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;evdev_device_create&lt;/code&gt; calls &lt;a href=&quot;https://gitlab.freedesktop.org/libinput/libinput/-/blob/master/src/evdev.c#L1773&quot;&gt;&lt;code&gt;evdev_configure_device&lt;/code&gt;&lt;/a&gt; with the &lt;code&gt;device&lt;/code&gt; as a parameter;
&lt;code&gt;device-&amp;gt;devname&lt;/code&gt; contains the name of the device and will contain &lt;code&gt;Wacom&lt;/code&gt; for the devices we&apos;re interested in.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(gdb) break evdev_configure_device if ((int) strstr(device-&amp;gt;devname, &amp;quot;acom&amp;quot;))
(gdb) c
Continuing.

Breakpoint 3, evdev_configure_device (device=0x55555566f000) at ../src/evdev.c:1775
1775		struct libevdev *evdev = device-&amp;gt;evdev;
(gdb) p device-&amp;gt;devname
$7 = 0x5555556365f0 &amp;quot;Wacom Intuos BT M Pad&amp;quot;
(gdb)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We&apos;re stepping through the function to see whether anything strange is happening.
We would like it to be recognized as a tablet pad so that the correct dispatch methods are set up.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(gdb) p udev_tags
$8 = (EVDEV_UDEV_TAG_INPUT | EVDEV_UDEV_TAG_TABLET | EVDEV_UDEV_TAG_TABLET_PAD)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So far so good. Continuing down it does correctly go into the &lt;code&gt;if&lt;/code&gt; &lt;a href=&quot;https://gitlab.freedesktop.org/libinput/libinput/-/blob/master/src/evdev.c#L1852&quot;&gt;on line 1852&lt;/a&gt;,
and calls &lt;code&gt;evdev_tablet_pad_create&lt;/code&gt;.
The pad is thus identified as a tablet pad.
Now we need to find out how events are read in &lt;code&gt;libinput&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Apparently, &lt;code&gt;libinput&lt;/code&gt; uses &lt;a href=&quot;https://www.freedesktop.org/wiki/Software/libevdev/&quot;&gt;&lt;code&gt;libevdev&lt;/code&gt;&lt;/a&gt; which is a wrapper library for evdev devices.
So instead of reading the files in &lt;code&gt;/dev/input&lt;/code&gt; like we did above, we can
get the events from handles that we get through &lt;code&gt;libevdev&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The function &lt;code&gt;libevdev_next_event&lt;/code&gt; is called in four places in &lt;code&gt;src/evdev.c&lt;/code&gt;, but the most
promising one is in &lt;code&gt;evdev_device_dispatch&lt;/code&gt;.
We can set a conditional breakpoint here for when our event is coming through&lt;sup&gt;&lt;a href=&quot;#user-content-fn-gdb&quot; id=&quot;user-content-fnref-gdb&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(gdb) break evdev.c:1061 if (ev-&amp;gt;type == 1 &amp;amp;&amp;amp; ev-&amp;gt;code == 0x100)
No source file named evdev.c.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (evdev.c:1061 if (ev-&amp;gt;type == 1 &amp;amp;&amp;amp; ev-&amp;gt;code == 0x100)) pending.
(gdb) run debug-events
Starting program: /home/mht/src/libinput/build/libinput debug-events
[Thread debugging using libthread_db enabled]
Using host libthread_db library &amp;quot;/usr/lib/libthread_db.so.1&amp;quot;.
process 48478 is executing new program: /home/mht/src/libinput/build/libinput-debug-events
[Thread debugging using libthread_db enabled]
Using host libthread_db library &amp;quot;/usr/lib/libthread_db.so.1&amp;quot;.
-event1   DEVICE_ADDED     Power Button                      seat0 default group1  cap:k
-event0   DEVICE_ADDED     Power Button                      seat0 default group2  cap:k
-event20  DEVICE_ADDED     Logitech Performance MX           seat0 default group3  cap:p left scroll-nat scroll-button
-event21  DEVICE_ADDED     Kingsis Peripherals Evoluent VerticalMouse 4 seat0 default group4  cap:p left scroll-nat scroll-button
-event26  DEVICE_ADDED     HD Pro Webcam C920                seat0 default group5  cap:k
-event24  DEVICE_ADDED     Wacom Intuos BT M Pen             seat0 default group6  cap:T  size 216x135mm
-event25  DEVICE_ADDED     Wacom Intuos BT M Pad             seat0 default group6  cap:P buttons:4 strips:0 rings:0 mode groups:1
-event17  DEVICE_ADDED     ZSA Ergodox EZ                    seat0 default group7  cap:k
-event18  DEVICE_ADDED     ZSA Ergodox EZ Mouse              seat0 default group7  cap:p left scroll-nat scroll-button
-event19  DEVICE_ADDED     ZSA Ergodox EZ System Control     seat0 default group7  cap:k
-event22  DEVICE_ADDED     ZSA Ergodox EZ Consumer Control   seat0 default group7  cap:kp scroll-nat
-event23  DEVICE_ADDED     ZSA Ergodox EZ Keyboard           seat0 default group7  cap:k

Breakpoint 1, evdev_device_dispatch (data=0x5555556718e0) at ../src/evdev.c:1061
1061			if (rc == LIBEVDEV_READ_STATUS_SYNC) {
(gdb) n
1075			} else if (rc == LIBEVDEV_READ_STATUS_SUCCESS) {
(gdb) n
1076				if (!once) {
(gdb) n
1077					evdev_note_time_delay(device, &amp;amp;ev);
(gdb) n
1078					once = true;
(gdb) n
1080				evdev_device_dispatch_one(device, &amp;amp;ev);
(gdb) s
evdev_device_dispatch_one (device=0x5555556718e0, ev=0x7fffffffdf60) at ../src/evdev.c:989
989	{
(gdb) n
990		if (!device-&amp;gt;mtdev) {
(gdb) n
991			evdev_process_event(device, ev);
(gdb) s
evdev_process_event (device=0x5555556718e0, e=0x7fffffffdf60) at ../src/evdev.c:974
974		struct evdev_dispatch *dispatch = device-&amp;gt;dispatch;
(gdb) n
975		uint64_t time = input_event_time(e);
(gdb) n
981		libinput_timer_flush(evdev_libinput_context(device), time);
(gdb) n
983		dispatch-&amp;gt;interface-&amp;gt;process(dispatch, device, e, time);
(gdb) s
pad_process (dispatch=0x555555674c00, device=0x5555556718e0, e=0x7fffffffdf60, time=33217800469) at ../src/evdev-tablet-pad.c:483
483		struct pad_dispatch *pad = pad_dispatch(dispatch);
(gdb) bt
#0  pad_process (dispatch=0x555555674c00, device=0x5555556718e0, e=0x7fffffffdf60, time=33217800469)
    at ../src/evdev-tablet-pad.c:483
#1  0x00007ffff7f7bd06 in evdev_process_event (device=0x5555556718e0, e=0x7fffffffdf60) at ../src/evdev.c:983
#2  0x00007ffff7f7bd4b in evdev_device_dispatch_one (device=0x5555556718e0, ev=0x7fffffffdf60) at ../src/evdev.c:991
#3  0x00007ffff7f7bfef in evdev_device_dispatch (data=0x5555556718e0) at ../src/evdev.c:1080
#4  0x00007ffff7f74f06 in libinput_dispatch (libinput=0x5555555773b0) at ../src/libinput.c:2125
#5  0x000055555555d1e0 in handle_and_print_events (li=0x5555555773b0) at ../tools/libinput-debug-events.c:827
#6  0x000055555555d6df in mainloop (li=0x5555555773b0) at ../tools/libinput-debug-events.c:953
#7  0x000055555555db1e in main (argc=1, argv=0x7fffffffe588) at ../tools/libinput-debug-events.c:1091
(gdb) list
478	pad_process(struct evdev_dispatch *dispatch,
479		    struct evdev_device *device,
480		    struct input_event *e,
481		    uint64_t time)
482	{
483		struct pad_dispatch *pad = pad_dispatch(dispatch);
484	
485		switch (e-&amp;gt;type) {
486		case EV_ABS:
487			pad_process_absolute(pad, device, e, time);
(gdb) n
485		switch (e-&amp;gt;type) {
(gdb) n
490			pad_process_key(pad, device, e, time);
(gdb) s
pad_process_key (pad=0x555555674c00, device=0x5555556718e0, e=0x7fffffffdf60, time=33217800469) at ../src/evdev-tablet-pad.c:332
332		uint32_t button = e-&amp;gt;code;
(gdb) p e
$1 = (struct input_event *) 0x7fffffffdf60
(gdb) p *e
$2 = {time = {tv_sec = 33217, tv_usec = 800469}, type = 1, code = 256, value = 1}
(gdb) n
333		uint32_t is_press = e-&amp;gt;value != 0;
(gdb) n
336		if (e-&amp;gt;value == 2)
(gdb) n
339		pad_button_set_down(pad, button, is_press);
(gdb) s
pad_button_set_down (pad=0x555555674c00, button=256, is_down=true) at ../src/evdev-tablet-pad.c:88
88		struct button_state *state = &amp;amp;pad-&amp;gt;button_state;
(gdb) list
83	static inline void
84	pad_button_set_down(struct pad_dispatch *pad,
85			    uint32_t button,
86			    bool is_down)
87	{
88		struct button_state *state = &amp;amp;pad-&amp;gt;button_state;
89	
90		if (is_down) {
91			set_bit(state-&amp;gt;bits, button);
92			pad_set_status(pad, PAD_BUTTONS_PRESSED);
(gdb) n
90		if (is_down) {
(gdb) n
91			set_bit(state-&amp;gt;bits, button);
(gdb) n
92			pad_set_status(pad, PAD_BUTTONS_PRESSED);
(gdb) n
97	}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At this point it&apos;s apparent that the event is going through the code successfully.
That is, the pad is recognized as a pad, and events from &lt;code&gt;libevdev&lt;/code&gt; are correctly setting the state of
the &lt;code&gt;evdev_device&lt;/code&gt; in &lt;code&gt;libinput&lt;/code&gt;. But, we&apos;re not getting any events, so where are events in &lt;code&gt;libinput&lt;/code&gt; made?&lt;/p&gt;
&lt;h2&gt;Events in libinput&lt;/h2&gt;
&lt;p&gt;In order to find out how events in &lt;code&gt;libinput&lt;/code&gt; works we can take a look in the only place so far that we know we&apos;ve seen them:
in &lt;a href=&quot;https://gitlab.freedesktop.org/libinput/libinput/-/blob/master/tools/libinput-debug-events.c&quot;&gt;&lt;code&gt;libinput-debug-events&lt;/code&gt;&lt;/a&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-lde&quot; id=&quot;user-content-fnref-lde&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;.
&lt;code&gt;libinput-debug-events.c&lt;/code&gt; has a &lt;code&gt;main&lt;/code&gt; function that does some argument parsing and initialization, and then calls &lt;code&gt;mainloop&lt;/code&gt;,
which contains a &lt;code&gt;do while&lt;/code&gt; loop, which &lt;code&gt;poll&lt;/code&gt;s a (the?) fd from &lt;code&gt;libinput&lt;/code&gt;, and calls &lt;a href=&quot;https://gitlab.freedesktop.org/libinput/libinput/-/blob/master/tools/libinput-debug-events.c#L822&quot;&gt;&lt;code&gt;handle_and_print_events&lt;/code&gt;&lt;/a&gt;.
This function gets all events with &lt;code&gt;libinput_get_event&lt;/code&gt;, and has a giant &lt;code&gt;switch&lt;/code&gt; to dispatch how the event should be printed.
But, there is no &lt;code&gt;default&lt;/code&gt; branch so, while unlikely, we might already hit a dead end.
We jump back into &lt;code&gt;gdb&lt;/code&gt; to test this (running with permissions; otherwise we get nothing!).&lt;/p&gt;
&lt;p&gt;Oh, that&apos;s right, my keyboard is also sending events, and so instead of &lt;code&gt;c&lt;/code&gt;ing though the initialization events until I get to
press the button on the pad and see what happens, I&apos;m getting swamped with events for me pressing &lt;code&gt;c&lt;/code&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-gdb2&quot; id=&quot;user-content-fnref-gdb2&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;!
Okay; we&apos;ll just add a &lt;code&gt;default&lt;/code&gt; case to the &lt;code&gt;switch&lt;/code&gt;, put a &lt;code&gt;printf&lt;/code&gt; there, and break on it.&lt;/p&gt;
&lt;p&gt;But oh, there&apos;s nothing coming through.&lt;/p&gt;
&lt;p&gt;Okay, so let&apos;s see where &lt;code&gt;libinput_get_event&lt;/code&gt; get its events.
It&apos;s &lt;a href=&quot;https://gitlab.freedesktop.org/libinput/libinput/-/blob/master/src/libinput.c#L2976&quot;&gt;here&lt;/a&gt;,
from the circular buffer &lt;code&gt;libinput-&amp;gt;events&lt;/code&gt;.
Let&apos;s see where this is written to:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;$ rg &amp;quot;events\[.*\]\s*=&amp;quot; src/
src/libinput.c
2971:	events[libinput-&amp;gt;events_in] = event;

src/evdev-tablet.c
2000:	struct input_event events[2] = {
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Oh, that&apos;s five lines over the function we just looked at.
The magical function in question is &lt;code&gt;libinput_post_event&lt;/code&gt;, and so, presumably,
all events we&apos;re getting from &lt;code&gt;libevdev&lt;/code&gt; should end up being sent to &lt;code&gt;post_event&lt;/code&gt;,
and our precious button click isn&apos;t.
This function is also sparingly called:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;$ rg &amp;quot;libinput_post_event&amp;quot; src/
src/libinput.c
335:libinput_post_event(struct libinput *libinput,
2217:	libinput_post_event(libinput, event);
2244:	libinput_post_event(device-&amp;gt;seat-&amp;gt;libinput, event);
2923:libinput_post_event(struct libinput *libinput,
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first occurrence is the prototype, the second is in &lt;code&gt;post_base_event&lt;/code&gt;,
the third is in &lt;code&gt;post_device_event&lt;/code&gt; which sounds promising, and the fourth is the function itself.
The problem is that both of these are &lt;code&gt;static&lt;/code&gt;, and so they have quite a few callers in the file.
We want to get closer to where the mapping from &lt;code&gt;libevdev&lt;/code&gt; events to &lt;code&gt;struct libinput_event&lt;/code&gt;s are,
so maybe it makes sense to go hunting for where the &lt;code&gt;libinput_event&lt;/code&gt;s are initialized.
There&apos;s even a &lt;code&gt;struct libinput_event_tablet_pad&lt;/code&gt;.
Looking further, we find the &lt;code&gt;tablet_pad_notify_button&lt;/code&gt; function which creates a
&lt;code&gt;libinput_event_tablet_pad&lt;/code&gt;, and sends to to &lt;code&gt;post_device_event&lt;/code&gt;.
This is probably where the button click should en up.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ rg &amp;quot;tablet_pad_notify_button&amp;quot; src/
src/libinput.c
2700:tablet_pad_notify_button(struct libinput_device *device,

src/libinput-private.h
662:tablet_pad_notify_button(struct libinput_device *device,

src/evdev-tablet-pad.c
394:				tablet_pad_notify_button(base,
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The call in &lt;code&gt;evdev-tablet-pad.c&lt;/code&gt; comes from the function &lt;code&gt;pad_notify_button_mask&lt;/code&gt;,
which we&apos;ll breakpoint.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;/h/m/s/l/build$ sudo gdb ./libinput-debug-events
[sudo] password for mht:
GNU gdb (GDB) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later &amp;lt;http://gnu.org/licenses/gpl.html&amp;gt;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type &amp;quot;show copying&amp;quot; and &amp;quot;show warranty&amp;quot; for details.
This GDB was configured as &amp;quot;x86_64-pc-linux-gnu&amp;quot;.
Type &amp;quot;show configuration&amp;quot; for configuration details.
For bug reporting instructions, please see:
&amp;lt;http://www.gnu.org/software/gdb/bugs/&amp;gt;.
Find the GDB manual and other documentation resources online at:
    &amp;lt;http://www.gnu.org/software/gdb/documentation/&amp;gt;.

For help, type &amp;quot;help&amp;quot;.
Type &amp;quot;apropos word&amp;quot; to search for commands related to &amp;quot;word&amp;quot;...
Reading symbols from ./libinput-debug-events...
(gdb) break pad_notify_button_mask
Function &amp;quot;pad_notify_button_mask&amp;quot; not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (pad_notify_button_mask) pending.
(gdb) run
Starting program: /home/mht/src/libinput/build/libinput-debug-events
[Thread debugging using libthread_db enabled]
Using host libthread_db library &amp;quot;/usr/lib/libthread_db.so.1&amp;quot;.
-event1   DEVICE_ADDED     Power Button                      seat0 default group1  cap:k
-event0   DEVICE_ADDED     Power Button                      seat0 default group2  cap:k
-event20  DEVICE_ADDED     Logitech Performance MX           seat0 default group3  cap:p left scroll-nat scroll-button
-event21  DEVICE_ADDED     Kingsis Peripherals Evoluent VerticalMouse 4 seat0 default group4  cap:p left scroll-nat scroll-button
-event26  DEVICE_ADDED     HD Pro Webcam C920                seat0 default group5  cap:k
-event24  DEVICE_ADDED     Wacom Intuos BT M Pen             seat0 default group6  cap:T  size 216x135mm
-event25  DEVICE_ADDED     Wacom Intuos BT M Pad             seat0 default group6  cap:P buttons:4 strips:0 rings:0 mode groups:1
-event17  DEVICE_ADDED     ZSA Ergodox EZ                    seat0 default group7  cap:k
-event18  DEVICE_ADDED     ZSA Ergodox EZ Mouse              seat0 default group7  cap:p left scroll-nat scroll-button
-event19  DEVICE_ADDED     ZSA Ergodox EZ System Control     seat0 default group7  cap:k
-event22  DEVICE_ADDED     ZSA Ergodox EZ Consumer Control   seat0 default group7  cap:kp scroll-nat
-event23  DEVICE_ADDED     ZSA Ergodox EZ Keyboard           seat0 default group7  cap:k

Breakpoint 1, pad_notify_button_mask (pad=0x555555672b30, device=0x55555566f400, time=42251078117, buttons=0x7fffffffddb0,
    state=LIBINPUT_BUTTON_STATE_PRESSED) at ../src/evdev-tablet-pad.c:365
365		struct libinput_device *base = &amp;amp;device-&amp;gt;base;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And it fires! We&apos;re now &lt;a href=&quot;https://gitlab.freedesktop.org/libinput/libinput/-/blob/master/src/evdev-tablet-pad.c#L359&quot;&gt;here&lt;/a&gt;,
and we would like to get to &lt;code&gt;394&lt;/code&gt;, or &lt;code&gt;402&lt;/code&gt;, which leads to &lt;code&gt;tablet_pad_notify_key&lt;/code&gt;, which seems to be basically the same but different.
We can set breakpoints and run, just in case we do end up in either:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(gdb) break 394
Breakpoint 2 at 0x7ffff7fa3b13: file ../src/evdev-tablet-pad.c, line 394.
(gdb) break 402
Breakpoint 3 at 0x7ffff7fa3b49: file ../src/evdev-tablet-pad.c, line 402.
(gdb) c
Continuing.

Breakpoint 1, pad_notify_button_mask (pad=0x555555672b30, device=0x55555566f400, time=42251378122, buttons=0x7fffffffddb0,
    state=LIBINPUT_BUTTON_STATE_RELEASED) at ../src/evdev-tablet-pad.c:365
365		struct libinput_device *base = &amp;amp;device-&amp;gt;base;
(gdb) c
Continuing.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;but we don&apos;t. We just get the second event, which is the key release.
This is good, because we know exactly where our event gets lost.&lt;/p&gt;
&lt;h2&gt;The Last Missing Piece?&lt;/h2&gt;
&lt;p&gt;Now it&apos;s time to make sense of what&apos;s going on in that &lt;code&gt;for&lt;/code&gt; loop.
A quick &lt;code&gt;gdb&lt;/code&gt; &lt;code&gt;p&lt;/code&gt; of &lt;code&gt;buttons-&amp;gt;bits&lt;/code&gt; shows that they&apos;re mostly &lt;code&gt;0&lt;/code&gt;,
so we&apos;ll put another breakpoint on line &lt;code&gt;378&lt;/code&gt;, just inside the &lt;code&gt;while&lt;/code&gt; loop, which we also hit.
Here are the local variables at that time:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(gdb) info locals
enabled = 21845
map = {value = 1432824624}
buttons_slice = 1 &apos;\001&apos;
base = 0x55555566f400
group = 0x555555672bd8
code = 256
i = 32
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that &lt;code&gt;enabled&lt;/code&gt; and &lt;code&gt;map&lt;/code&gt; are garbage values so far.
Stepping down to the first &lt;code&gt;if&lt;/code&gt; changes things a little:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(gdb) info locals
enabled = 1
map = {value = 1432824624}
buttons_slice = 0 &apos;\000&apos;
base = 0x55555566f400
group = 0x555555672bd8
code = 257
i = 32
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which means we&apos;re enabled. Good. Then we step past the &lt;code&gt;map&lt;/code&gt; assignment&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(gdb) p map
$4 = {value = 4294967295}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, since we know that we didn&apos;t get to either &lt;code&gt;tablet_pad_notify&lt;/code&gt; function, or the &lt;code&gt;abort&lt;/code&gt; call,
&lt;code&gt;map_is_unmapped&lt;/code&gt; will be &lt;code&gt;true&lt;/code&gt;, which it is:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(gdb) n
387					continue;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;How does one know whether a map is unmapped? Well,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#define map_is_unmapped(x_) ((x_).value == (uint32_t)-1)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and &lt;code&gt;(uint32_t) -1 == 4294967295&lt;/code&gt;, which means that we need to rewind, and look at the line&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;map = pad-&amp;gt;button_map[code - 1];
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Looking further at what&apos;s in this map gives us a very important clue:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(gdb) p pad-&amp;gt;button_map
$7 = {{value = 4294967295} &amp;lt;repeats 272 times&amp;gt;, {value = 0}, {value = 1}, {value = 4294967295}, {value = 4294967295}, {
    value = 4294967295}, {value = 2}, {value = 3}, {value = 4294967295} &amp;lt;repeats 489 times&amp;gt;}
(gdb) p pad-&amp;gt;button_map[272]
$8 = {value = 0}
(gdb) p pad-&amp;gt;button_map[273]
$9 = {value = 1}
(gdb) p pad-&amp;gt;button_map[277]
$10 = {value = 2}
(gdb) p pad-&amp;gt;button_map[278]
$11 = {value = 3}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These are exactly&lt;sup&gt;&lt;a href=&quot;#user-content-fn-dec&quot; id=&quot;user-content-fnref-dec&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; the numbers that we saw in the output from &lt;code&gt;libwacom-list-local-devices&lt;/code&gt;, and
that we bothered translating from hex to decimal!
So the map is here, but we&apos;re skipping the iteration with the button click because we&apos;re mistaken in which index to look at.
At this point, I really hoped this was a off-by-one thing and that &lt;code&gt;code - 1&lt;/code&gt; was &lt;code&gt;271&lt;/code&gt;. But,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(gdb) p code
$19 = 257
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is when clicking the first button, and
clicking the second yields &lt;code&gt;code == 258&lt;/code&gt; and so on.
In other words, it looks like we&apos;re off by 16 bits.&lt;/p&gt;
&lt;p&gt;Let&apos;s get the overview:
&lt;code&gt;buttons-&amp;gt;bits&lt;/code&gt; is a byte array of 96 bytes, and we&apos;re looking at which bits are set.
To do this, we look at each byte (this is the &lt;code&gt;for&lt;/code&gt; loop), and look at each bit in that byte until &lt;code&gt;buttons_slice&lt;/code&gt;, the current byte, is &lt;code&gt;0&lt;/code&gt; (this is the &lt;code&gt;while&lt;/code&gt; loop).
Our problem is that &lt;code&gt;code&lt;/code&gt;, which is the &lt;em&gt;bit&lt;/em&gt; offset in the whole &lt;em&gt;byte&lt;/em&gt; array, is off by 16, i.e. two bytes.
In other words, we need to find out where &lt;code&gt;buttons-&amp;gt;bits&lt;/code&gt; are set.&lt;/p&gt;
&lt;p&gt;For at least one caller of the function, &lt;code&gt;pad_notify_buttons&lt;/code&gt;, the buttons are set in &lt;code&gt;pad_get_buttons_{pressed,released}&lt;/code&gt;.
Looking at the stack trace (with &lt;code&gt;bt&lt;/code&gt; in &lt;code&gt;gdb&lt;/code&gt;) we see this is indeed the place where we come from.
But the logic there is very simple, and leaves no room for errors such as this.
In addition, &lt;code&gt;pad-&amp;gt;button_state&lt;/code&gt; has the same error:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(gdb) p pad-&amp;gt;button_state
$16 = {bits = &apos;\000&apos; &amp;lt;repeats 32 times&amp;gt;, &amp;quot;\001&amp;quot;, &apos;\000&apos; &amp;lt;repeats 62 times&amp;gt;}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We know this is wrong, since we are supposed to end up at &lt;code&gt;272&lt;/code&gt;.
Well, according to &lt;code&gt;libwacom&lt;/code&gt; anyways.&lt;/p&gt;
&lt;h2&gt;Back to libwacom&lt;/h2&gt;
&lt;p&gt;At this point I&apos;m getting suspicious. How certain are we really that the mapping isn&apos;t set up wrong?
After all, in the &lt;code&gt;evdev&lt;/code&gt; events we read out from &lt;code&gt;/dev/input/event25&lt;/code&gt; was &lt;code&gt;0x0100 == BTN_0&lt;/code&gt;, and not
&lt;code&gt;272 == 0x110 == BTN_LEFT&lt;/code&gt;, which I think is strangely well fitting for out problem.
This would also make sense with &lt;code&gt;libinput&lt;/code&gt;, since it presumably queries either the pad itself or
&lt;code&gt;libwacom&lt;/code&gt; to get the mapping, but there&apos;s a mismatch between the mapping and what&apos;s really being sent.&lt;/p&gt;
&lt;p&gt;Let&apos;s push our current bug hunt onto our mental stack, and try to look at this map instead.
Okay, so where does &lt;code&gt;libwacom-list-local-devices&lt;/code&gt; get those numbers from?
&lt;a href=&quot;https://github.com/linuxwacom/libwacom/blob/master/tools/list-local-devices.c&quot;&gt;&lt;code&gt;tools/list-local-devices.c&lt;/code&gt;&lt;/a&gt;
contains a call to &lt;code&gt;libwacom_print_device_description()&lt;/code&gt; in &lt;code&gt;libwacom.c&lt;/code&gt;, which again calls
&lt;code&gt;print_buttons_for_device&lt;/code&gt;, which &lt;em&gt;again&lt;/em&gt; calls &lt;code&gt;print_button_evdev_codes&lt;/code&gt;, which calls
&lt;code&gt;libwacom_get_button_evdev_code&lt;/code&gt;.
This function basically indexes into &lt;code&gt;device-&amp;gt;button_codes&lt;/code&gt;, which we now assume are wrong.&lt;/p&gt;
&lt;p&gt;The button codes are set in &lt;a href=&quot;https://github.com/linuxwacom/libwacom/blob/master/libwacom/libwacom-database.c&quot;&gt;this file&lt;/a&gt;,
but by simple inspection it&apos;s not clear what&apos;s wrong, so we clone the repository, and build the tool ourselves
with the &lt;a href=&quot;https://github.com/linuxwacom/libwacom/wiki#building&quot;&gt;build instructions&lt;/a&gt; from the wiki.
We compile, start up &lt;code&gt;gdb&lt;/code&gt;, set the breakpoints, but uh oh, &lt;code&gt;SIGSEGV&lt;/code&gt;.
I revert to the &lt;code&gt;libwacom-1.3&lt;/code&gt; tag, and now we don&apos;t segfault any more, but we get&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Failed to initialize device database
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which we solve by passing &lt;code&gt;--database ../data&lt;/code&gt; when running. All is well, and the &lt;code&gt;evdev&lt;/code&gt; codes are still &lt;code&gt;0x110&lt;/code&gt; and counting.
We run it in gdb:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(gdb) set args --database ../data
(gdb) break set_button_codes_from_heuristics if (device-&amp;gt;model_name &amp;amp;&amp;amp; ((int) strcmp(device-&amp;gt;model_name, &amp;quot;CTL-6100WL&amp;quot;)) == 0)
Breakpoint 5 at 0x7ffff7fc215e: file ../libwacom/libwacom-database.c, line 418.
(gdb) break set_button_codes_from_string if (device-&amp;gt;model_name &amp;amp;&amp;amp; ((int) strcmp(device-&amp;gt;model_name, &amp;quot;CTL-6100WL&amp;quot;)) == 0)
Breakpoint 6 at 0x7ffff7fc2024: file ../libwacom/libwacom-database.c, line 391.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/mht/src/libwacom/builddir/libwacom-list-local-devices --database ../data
[Thread debugging using libthread_db enabled]
Using host libthread_db library &amp;quot;/usr/lib/libthread_db.so.1&amp;quot;.

Breakpoint 5, set_button_codes_from_heuristics (device=0x5555555acdd0) at ../libwacom/libwacom-database.c:418
418		for (i = 0; i &amp;lt; device-&amp;gt;num_buttons; i++) {
(gdb) p *device
$16 = {name = 0x5555555a30c0 &amp;quot;Wacom Intuos BT M&amp;quot;, model_name = 0x5555555acc90 &amp;quot;CTL-6100WL&amp;quot;, width = 9, height = 5, match = 0,
  matches = 0x5555555acbb0, nmatches = 2, paired = 0x0, cls = WCLASS_BAMBOO, num_strips = 0, features = 1, integration_flags = 0,
  strips_num_modes = 0, ring_num_modes = 0, ring2_num_modes = 0, num_styli = 1, supported_styli = 0x5555555abbe0, num_buttons = 4,
  buttons = 0x5555555ad100, button_codes = 0x5555555ad090, num_leds = 0, status_leds = 0x0,
  layout = 0x555555586690 &amp;quot;../data/layouts/intuos-m-p3.svg&amp;quot;, refcnt = 1}
(gdb) list
413	
414	static inline void
415	set_button_codes_from_heuristics(WacomDevice *device)
416	{
417		gint i;
418		for (i = 0; i &amp;lt; device-&amp;gt;num_buttons; i++) {
419			if (device-&amp;gt;cls == WCLASS_BAMBOO ||
420			    device-&amp;gt;cls == WCLASS_GRAPHIRE) {
421				switch (i) {
422				case 0:
(gdb)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So we&apos;re in &lt;code&gt;set_button_codes_from_heuristics&lt;/code&gt;, and since our device class is &lt;code&gt;BAMBOO&lt;/code&gt;, although I don&apos;t know why that is,
we default to &lt;code&gt;BTN_LEFT&lt;/code&gt; as the first button, which is &lt;code&gt;0x110&lt;/code&gt;.&lt;/p&gt;
&lt;h1&gt;The Fix&lt;/h1&gt;
&lt;p&gt;I&apos;m not really sure what the &lt;code&gt;Class&lt;/code&gt; field does in this config, apart from heuristically setting key codes, but the fix that made it
all word was simple: set the class to something else.
I changed it on my system (the file was in &lt;code&gt;/usr/share/libwacom/intuos-m-p3-wl.tablet&lt;/code&gt;), and &lt;a href=&quot;https://github.com/linuxwacom/libwacom/pull/261&quot;&gt;submitted a PR&lt;/a&gt; upstream.
All in all, this adventure took my entire Saturday, and the fix was one line,
but I&apos;m finally getting events when I&apos;m pressing the buttons.&lt;/p&gt;
&lt;p&gt;Now, how do I make these buttons to anything useful?&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-xournalppbind&quot;&gt;
&lt;p&gt;Never mind that XournalPP doesn&apos;t have good (or decent, or any?) support for key rebinding. &lt;a href=&quot;#user-content-fnref-xournalppbind&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-touchingpad&quot;&gt;
&lt;p&gt;Interestingly, we don&apos;t even have to touch the pad itself; it seems to be sufficient for the tip of the pen to be pushed in for the action to be interpreted as drawing. &lt;a href=&quot;#user-content-fnref-touchingpad&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt; &lt;a href=&quot;#user-content-fnref-touchingpad-2&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-mm&quot;&gt;
&lt;p&gt;This is supported by the output of &lt;code&gt;debug-events&lt;/code&gt;, which humorously states that the size of the &lt;em&gt;pen&lt;/em&gt; is 216x135mm. &lt;a href=&quot;#user-content-fnref-mm&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-manhexdump&quot;&gt;
&lt;p&gt;two things were confusing: the fact that in the format string you need a space in between the byte count and the format, which was not explicitly stated, and that &amp;quot;squeezing&amp;quot; is by default on, which completely messes up the output if you are defining your own format. &lt;a href=&quot;#user-content-fnref-manhexdump&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-inputevent&quot;&gt;
&lt;p&gt;It would probably be easier to just write a small program and cast a pointer to an array with the data we read to the struct we suspect. &lt;a href=&quot;#user-content-fnref-inputevent&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-uapi&quot;&gt;
&lt;p&gt;I guess &lt;code&gt;uapi&lt;/code&gt; is for user space, and that the directory is superfluous when you&apos;re not doing kernel dev? &lt;a href=&quot;#user-content-fnref-uapi&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-gdb&quot;&gt;
&lt;p&gt;This represents most of my workflow in &lt;code&gt;gdb&lt;/code&gt;: set breakpoints, &lt;code&gt;n&lt;/code&gt; or &lt;code&gt;s&lt;/code&gt; down wherever, &lt;code&gt;list&lt;/code&gt; unless I have the source code right by, and &lt;code&gt;p&lt;/code&gt; expressions; sometimes I&apos;ll also &lt;code&gt;pt&lt;/code&gt; for when I don&apos;t know the types of things.
It&apos;s... not great? But it&apos;s alright. I would like to have better integration in my text editor, that is, I don&apos;t really want to leave my text editor when debugging, since mentally I&apos;m doing the same in both programs, but I haven&apos;t actually bothered seeing what&apos;s out there.
My experience from trying out &lt;code&gt;gdb&lt;/code&gt; integration in &lt;code&gt;vim&lt;/code&gt; was pretty bad, and if it doesn&apos;t work well in &lt;code&gt;vim&lt;/code&gt;, I don&apos;t see how semi-obscure editors stand a chance. &lt;a href=&quot;#user-content-fnref-gdb&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-lde&quot;&gt;
&lt;p&gt;At least this is a place where we know that some of the &lt;code&gt;libevdev&lt;/code&gt; events are coming through and some are not. &lt;a href=&quot;#user-content-fnref-lde&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-gdb2&quot;&gt;
&lt;p&gt;I&apos;m sure there&apos;s a way of breaking conditionally based on the event type, and I browsed through the types a little bit with &lt;code&gt;gdb&lt;/code&gt;, but couldn&apos;t find anything that seemed useful.
When the alternative, adding a &lt;code&gt;default&lt;/code&gt; branch to a &lt;code&gt;switch&lt;/code&gt; in a codebase I already had &lt;code&gt;clone&lt;/code&gt;d and build, was so simple, it makes sense to do, despite not really being what I wanted to do. &lt;a href=&quot;#user-content-fnref-gdb2&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-dec&quot;&gt;
&lt;p&gt;Had this been in a textbook I would think &amp;quot;yeah sure, that&apos;s reeeaaallly convenient how that minor thing we did way back when turned out to be useful.&amp;quot;, but I promise, I did &lt;em&gt;not&lt;/em&gt; go back and added the conversion after the fact! &lt;a href=&quot;#user-content-fnref-dec&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Algorithm complexity</title><id>https://mht.wtf/post/big-o/</id><updated>2014-09-23T12:00:00+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/big-o/" rel=""/><link href="https://mht.wtf/post/big-o/index.html" rel="alternate"/><published>2014-09-23T12:00:00+01:00</published><content type="text/html">&lt;p&gt;Recently I&apos;ve stumbeled upon a few blogposts and Internet discussions involving the complexity of popular algorihms and operations of well known data structures.
In these posts, one usually can&apos;t miss the &lt;code&gt;Big-O&lt;/code&gt; notation - including it&apos;s pitfalls, of which there are a few.
Therefore I would like to try to clear things up.&lt;/p&gt;
&lt;h2&gt;The definition&lt;/h2&gt;
&lt;p&gt;Let&apos;s start head on. &lt;a href=&quot;http://en.wikipedia.org/wiki/Big_O_notation#Formal_definition&quot;&gt;Wikipedia&lt;/a&gt; says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Let $f(x)$ and $g(x)$ be two functions defined on some subset of the real numbers. One writes:
$$f(x) = O(g(x)) \text{ as } x \to \infty$$
if and only if there is a positive constant $M$ such that for all sufficiently large values of $x$, $f(x)$ is at most $M$ multiplied by $g(x)$ in absolute value.
That is, $f(x) = O(g(x))$ if and only if there exists a positive real number $M$ and a real number $x_0$ such that
$$|f(x)|\leq M|g(x)| \text{ for all } x &amp;gt; x_0.$$&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Pretty straight forward? No? Ok, let&apos;s try to break it down.&lt;/p&gt;
&lt;h2&gt;The basics&lt;/h2&gt;
&lt;p&gt;We have two functions, $f(n)$ and $g(n)$. $f(n)$ is the actual running time of our algorithm, and $g(n)$ is whats inside the $O()$.
Lets set $f(n) = 2n^2+12n+61$, and $g(n) = n^2$ so we have some good ol&apos; nubmers to look at.
Now we could write:&lt;/p&gt;
&lt;p&gt;$$f(n) = O(g(n)) \implies 2n^2+12n+61 = O(n^2).$$&lt;/p&gt;
&lt;p&gt;If we simplyfy, our claim is that when $n$ gets large, the two functions are almost equal.
This means that if we need $f(999999)$, a pretty good aproximation would be $g(999999) = 999998000001$.&lt;/p&gt;
&lt;p&gt;Note the &apos;&lt;em&gt;pretty&lt;/em&gt;&apos;! $f(999999)$ is actually $ 2000008000051 $, which is &lt;strong&gt;twice&lt;/strong&gt; as much.
Now, look again at $f(n)$, and take a guess why it&apos;s twice, and not any other multiple.
So far so good. This probably isn&apos;t news for anyone having seen the &lt;code&gt;big-O&lt;/code&gt; notation before; however, remember that we simplyfied things a little.&lt;/p&gt;
&lt;h2&gt;The runtimes&lt;/h2&gt;
&lt;p&gt;When working with algorithms, one usualy is interested in getting the fastest.
Why spend two seconds sorting a list of numbers when you could use only one second?
Having now learnt the magic of &lt;code&gt;big-O&lt;/code&gt;, we find a table of popular sorting algorimths, and their average case runtime;
we laugh when we realize the author of the table has included multiple $O(n^2)$ algorithms, when we
see multiple $O(n \log{n})$, and even some $O(n)$ algorithms.
&amp;quot;Why would you even do something like that?&amp;quot; we say to ourselves, and shake our heads.&lt;/p&gt;
&lt;p&gt;Later we decide to learn the almighty &lt;code&gt;quicksort&lt;/code&gt;.
We look up an implementation in our favorite language, but we are startled when the first lines (maybe) looks something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-py&quot;&gt;def quicksort(array):
  if array.length &amp;lt; 10:
    insertionsort(array)
  # ...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We rush back to the runtime table, and finds &lt;code&gt;insertion sort&lt;/code&gt;: $O(n^2)$?!
Surely something must be wrong here; why else would one use a slower algorithm?
We decide this implementation is far from optimal, and find another one. Which turns out to be exacly the same.
So what is going one here? &lt;code&gt;big-O&lt;/code&gt; promised us that $O(n\log{n})$ is better than $O(n^2)$, so how come
quicksort wants to use a $O(n^2)$ algorithm?
This is when our simplification comes back to bite us in the rear.&lt;/p&gt;
&lt;h2&gt;The catch&lt;/h2&gt;
&lt;p&gt;We said &lt;em&gt;&amp;quot;If we simplyfy, our claim is that when $n$ gets large, the two functions are almost equal.&amp;quot;&lt;/em&gt;
So what about when $n$ isn&apos;t very large?
Looking back to the Wikipedia definition, there was something about an $x_0$.
It turns out &lt;code&gt;Big-O&lt;/code&gt; got the definition of &amp;quot;large&amp;quot; covered;
Large is if $x\gt x_0$. This doesn&apos;t say much, though.
But here&apos;s the kicker: &lt;strong&gt;we get to choose $x_0$&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Consider the following graph:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;graph-1.svg&quot; alt=&quot;graph 1&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If we choose $x_0=a$ we can&apos;t really say much; the graphs are switching between being on top and bottom.
However, if we choose $x_0=b$ we can see that (for all we know, and which is probable)
$f(x)$ is the larger function, and hence the one with the longer run time.
In this case, $f(x)=x^2+8 = O(n^2)$ and $g(x) = 8x = O(n)$.
Given that, we would say that $g(x)$ is the faster of the two. But what if the only possible values for $n$ is between $a$ and $b$?
Then, clearly, $f(x)$ is faster, as seen from the graph, even though $O(n^2) \gt O(n)$.&lt;/p&gt;
&lt;p&gt;Now we understand that the &lt;code&gt;Big-O&lt;/code&gt; notation doesn&apos;t really say anything about actual runtime, just the runtime when your input is really large.
This is exacly what happends in &lt;code&gt;quicksort&lt;/code&gt; - the &lt;code&gt;insertion sort&lt;/code&gt; algorithm, though pretty slow on large input, performs really well on smaller inputs.
Even &lt;code&gt;timsort&lt;/code&gt;, the standard sorting algorithm in &lt;code&gt;Python&lt;/code&gt;, &lt;code&gt;Java SE 7&lt;/code&gt;, and the &lt;code&gt;Android&lt;/code&gt; platform, uses &lt;code&gt;insertion sort&lt;/code&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-timsort&quot; id=&quot;user-content-fnref-timsort&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h2&gt;What we&apos;ve got so far&lt;/h2&gt;
&lt;p&gt;It&apos;s often easier to grasp a concept when you don&apos;t look at the general case, but rather a example.
Let&apos;s say one of our functions have a runnning time of $f(n) = 2n^2 + 9n +120$.
A start would be to set $f(n) = O(2n^2 + 8n + 120)$.
Figuring out the asymptotic runitme of a function isn&apos;t excacly straight forward.
The short story here is you can forget all parts without the highest exponential&lt;sup&gt;&lt;a href=&quot;#user-content-fn-simpl&quot; id=&quot;user-content-fnref-simpl&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, which in this case is the $2n^2$ .
Then we&apos;re left with $O(2n^2)$.
We can also forget about all constants; now there&apos;s really nothing left for us to remove, as we&apos;ve got $O(n^2)$.&lt;/p&gt;
&lt;p&gt;How can we know this is correct? Let&apos;s just look at the following graph:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;graph-2.svg&quot; alt=&quot;graph 2&quot; /&gt;&lt;/p&gt;
&lt;p&gt;We see $M=3$, so  $Mg(n)=3n^2$, and $f(n)=2n^2+8n+120$.
We can see in the beginning, $f(n)$ is the slower one (remember, the one on top is the slower!), but $3n^2$ catches up pretty fast,
and it only goes one way from there.
From the definition, we now see that:&lt;/p&gt;
&lt;p&gt;$$f(n) = 2n^2 + 9n + 120 = O(n^2)$$&lt;/p&gt;
&lt;p&gt;because $3n^2$ was always larger than $f(n)$ from the intersection point.&lt;/p&gt;
&lt;p&gt;Again, we don&apos;t know excacly where the graphs intersect, so just given the graphs we couldn&apos;t say what $x_0$ could be,
but that doesn&apos;t matter. The intersection point could be a gazillion, and it would be OK.
It could take the function a gazillion years to compute with the input $n=x_0$, but it wouldn&apos;t matter,
because &lt;em&gt;asymptotically&lt;/em&gt; $Mg(n)$ would be the larger function.&lt;/p&gt;
&lt;h2&gt;What does it all mean?&lt;/h2&gt;
&lt;p&gt;So we&apos;ve seen a few cool graphs, and some math written in $\LaTeX$, but what does it all mean? Why should you care?
Let&apos;s say you&apos;re writing a cool program with a function you know will be called a lot. Maybe it looks a little bit like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;for (int i = 0; i &amp;lt; n; i++){
  for (int j = 0; j &amp;lt; n; j++){
    for (int k = 0; k &amp;lt; n; k++){
      // Lots of cool stuff
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After what we&apos;ve learned you could easily see that this code runs at least in $O(n^3)$, and being familiar with quite a few algorithms, you realize that&apos;s qutie a bit.&lt;sup&gt;&lt;a href=&quot;#user-content-fn-floyd&quot; id=&quot;user-content-fnref-floyd&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;
You could now consider rewriting it, making it more efficient, even without ever running it! Sounds too good to be true? Well, it kind of is.&lt;/p&gt;
&lt;p&gt;Remember that the actual running time and &lt;em&gt;asymptotic running time&lt;/em&gt; is &lt;strong&gt;not&lt;/strong&gt; the same thing.
This means if this part of your code is crucial to your application, you should use a &lt;em&gt;profiler&lt;/em&gt; to find out which of your algorithms that have the shortest &lt;em&gt;actual&lt;/em&gt; running time.
If it is not crucial, you should probably leave it be. Remember kids: premature optimization is very bad!
Having said that, the asymptotic running time is &lt;em&gt;usually&lt;/em&gt; a good indication.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;quot;We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.&amp;quot;&lt;/p&gt;
&lt;p&gt;--- Donald E. Knuth&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Question time!&lt;/h2&gt;
&lt;p&gt;What about $\Omega()$ and $\Theta()$?
: I have decided not to say anything about neither of those, because &lt;code&gt;big-O&lt;/code&gt; is the one people usually run into,
and it is definitely the most used - at least outside computer science circles (you could even say outside academia).&lt;/p&gt;
&lt;p&gt;You said one shouldn&apos;t optimize without profiling anyway. Doesn&apos;t that make this whole mess kind of useless?
: A lot of CS students jump to this conclusion when realizing you shouldn&apos;t go on an optimizing spree as soon as you have learned algrorithm analysis.
And it isn&apos;t completely wrong; you probably did fine before reading this text, and you would probably continue to do so without all this.
This is just another tool in your programmer toolbox.
But the same way a musician should understand his instrument, a programmer should understand his algorithms and data structures -
you wouldn&apos;t care to listen to a guitarist who had no idea where all the sound came from, would you?&lt;/p&gt;
&lt;h2&gt;Questions for you!&lt;/h2&gt;
&lt;p&gt;Time to see what you&apos;ve learned.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When finding the runnning time of $f(n) = O(2n^2 + 8n + 120)$, I said you could ignore constants. Why?&lt;/li&gt;
&lt;li&gt;How come I can say $2n^2 + 4 = O(n^4)$ and still be right?&lt;/li&gt;
&lt;li&gt;How would you know which algorithm is faster: &lt;code&gt;merge-sort&lt;/code&gt; or &lt;code&gt;heap-sort&lt;/code&gt;, both of which are $O(n \log n)$?&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Further reading&lt;/h2&gt;
&lt;p&gt;This is by no means a complete guide to algorithm analysis.
In fact, this is only about the &lt;code&gt;big-O&lt;/code&gt; notation, on which it even isn&apos;t a really thorough guide - this is only a gentle introduction.&lt;/p&gt;
&lt;p&gt;If this was interesting I strongly recommend reading more. This is just the tip of the iceberg.
Great resources are CLRS&lt;sup&gt;&lt;a href=&quot;#user-content-fn-cormen&quot; id=&quot;user-content-fnref-cormen&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; if you think math is OK, or Sedgewick&lt;sup&gt;&lt;a href=&quot;#user-content-fn-sedgewick&quot; id=&quot;user-content-fnref-sedgewick&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; if you don&apos;t.
If you actually love algorithm analysis, and just read this for fun, I recommend Knuths TAOCP&lt;sup&gt;&lt;a href=&quot;#user-content-fn-taocp&quot; id=&quot;user-content-fnref-taocp&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;. Pick any volume, although I suggest you start with volume 1 if you want to understand any of the code.&lt;sup&gt;&lt;a href=&quot;#user-content-fn-mmix&quot; id=&quot;user-content-fnref-mmix&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;
If you don&apos;t like reading, why did you even read this seciton? $\blacksquare$&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-timsort&quot;&gt;
&lt;p&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Timsort&quot;&gt;http://en.wikipedia.org/wiki/Timsort&lt;/a&gt;. &lt;a href=&quot;#user-content-fnref-timsort&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-simpl&quot;&gt;
&lt;p&gt;This is extremely simplified, and will only work on polynomials! &lt;a href=&quot;#user-content-fnref-simpl&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-floyd&quot;&gt;
&lt;p&gt;There are good algorithms with a running time of $O(n^3)$; for instance &lt;a href=&quot;http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm&quot;&gt;Floyd-Warshall&lt;/a&gt; runs in $O(V^3)$. &lt;a href=&quot;#user-content-fnref-floyd&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-cormen&quot;&gt;
&lt;p&gt;&lt;em&gt;Introduction to Algorithms&lt;/em&gt; by Cormen, Leiserson, Rivest, Stein. &lt;a href=&quot;http://www.amazon.com/Introduction-Algorithms-Thomas-H-Cormen/dp/0262033844&quot;&gt;Amazon.com&lt;/a&gt; &lt;a href=&quot;#user-content-fnref-cormen&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-sedgewick&quot;&gt;
&lt;p&gt;&lt;em&gt;Algorithms&lt;/em&gt; by Sedgewick and Wayne. &lt;a href=&quot;http://www.amazon.com/Algorithms-4th-Robert-Sedgewick/dp/032157351X/ref=sr_1_1?s=books&amp;amp;ie=UTF8&amp;amp;qid=1398882849&amp;amp;sr&quot;&gt;Amazon.com&lt;/a&gt; &lt;a href=&quot;#user-content-fnref-sedgewick&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-taocp&quot;&gt;
&lt;p&gt;&lt;em&gt;The Art of Computer Programming&lt;/em&gt; by Donald E. Knuth. &lt;a href=&quot;#user-content-fnref-taocp&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-mmix&quot;&gt;
&lt;p&gt;In addition, the current editions of volume 1-3 uses MIX as the target machine, although volume 4 uses MMIX, a newer and more modern version. This means if you want to read volume 4 you should also buy Volume 1 fascicle 1, as this teaches MMIX instead of MIX. &lt;a href=&quot;#user-content-fnref-mmix&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Writing a JPEG decoder in Rust - Part 1: Background</title><id>https://mht.wtf/post/jpeg-rust-1/</id><updated>2016-08-05T13:12:00+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/jpeg-rust-1/" rel=""/><link href="https://mht.wtf/post/jpeg-rust-1/index.html" rel="alternate"/><published>2016-08-05T13:12:00+02:00</published><content type="text/html">&lt;p&gt;In the past months I have spent the evenings and weekends on a little project:
a JPEG decoder and encoder, written in Rust.&lt;/p&gt;
&lt;p&gt;First, I should drop a little disclaimer:
at the time I&apos;m writing this post, I have successfully decoded multiple test images,
but these are fairly standard type images, so more exotic and advanced parts
of the JPEG and JFIF standard are yet to be implemented properly, or even at all.
Therefore, current program design decisions, as well as explanations of formats and techniques,
&lt;em&gt;may&lt;/em&gt; be executed poorly, as they are based on what I currently know,
and the subset of functionality I have implemented.
Hopefully, I am not too far off.&lt;/p&gt;
&lt;p&gt;When I started working on this project, I had not decided how this post should be.
Should it be a step-by-step kind of guide, or more of a writeup of a working program?
Initially, I wanted the former, but I changed my mind as I was progressing, because
I struggeled with understanding how the decoding process worked, which lead to strange design
decisions, commits going back and fourth, and weird naming conventions.
I do not believe this process would make a pleasant reading experience for anyone.
Instead I want to write a post on how it has turned out so far.
This first post will cover the background needed to understand this mess that is JPEG.&lt;/p&gt;
&lt;h1&gt;Why?&lt;/h1&gt;
&lt;p&gt;But first, why am I writing this?
The choice of writing a JPEG encoder/decoder is somewhat arbitrary.
In fact, if I knew what I know now, I think I would have chosen a different format than JPEG.
This is mostly because this was supposed to be a weekend, or maybe weeklong, project;
as I&apos;m writing this, there is approximately five weeks since the initial git commit&lt;sup&gt;&lt;a href=&quot;#user-content-fn-project-time&quot; id=&quot;user-content-fnref-project-time&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The choice of using Rust, however, is not arbitrary.
I have found Rust to be an expressive, performant, and fun language to use;
it allows high level abstractions and patterns, while still keeping that raw, low-level feeling you get from e.g C.
I guess the project of writing a JPEG encoder/decoder was a great excuse for writing a non-trivial program in Rust.&lt;/p&gt;
&lt;p&gt;And as always, I want to improve my writing.
Writing is hard!
Feedback is of course very welcome, even though there is no comment section here.&lt;/p&gt;
&lt;p&gt;Let us look a little closer on JPEG.&lt;/p&gt;
&lt;h1&gt;JPEG?&lt;/h1&gt;
&lt;p&gt;JPEG is a method for lossy image compression;
it is not&lt;sup&gt;&lt;a href=&quot;#user-content-fn-pure-jpeg&quot; id=&quot;user-content-fnref-pure-jpeg&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, as you might believe, an image format.
There are however image formats which use JPEG, such as
JPEG/Exif --- this is what your digital camera spits out (unless you are only capturing RAW) ---
and JPEG/JFIF --- the file format we will work with.
JFIF&lt;sup&gt;&lt;a href=&quot;#user-content-fn-jpeg-jfif&quot; id=&quot;user-content-fnref-jpeg-jfif&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; is the most common format of transmitting images on the web&lt;sup&gt;&lt;a href=&quot;#user-content-fn-wiki-copy&quot; id=&quot;user-content-fnref-wiki-copy&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.
This is great knowledge to pull out when someone says &amp;quot;JPEG file&amp;quot; or something similiar:
&amp;quot;Uhmm, actually ... &amp;quot;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-please-dont&quot; id=&quot;user-content-fnref-please-dont&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Let&apos;s see how a JFIF file is laid out, and then what the JPEG data format looks like.&lt;/p&gt;
&lt;h1&gt;The JFIF Part&lt;/h1&gt;
&lt;p&gt;Apart from actual image data, the JFIF file contains data such as image dimensions,
comment, different tables, and more. A JFIF file consists of &lt;em&gt;segments&lt;/em&gt;.
Each segment contains a marker, a length, and data.
The marker is two bytes, and is used to identify the segment.
The length is two bytes, and specifies how long the segment is, excluding the marker bytes, but including the length bytes.
The data fills the rest of the segment, according to the length.&lt;/p&gt;
&lt;p&gt;For instance, the marker for &amp;quot;Comment&amp;quot; is &lt;code&gt;0xfffe&lt;/code&gt;, making the bytes for specifying &amp;quot;Hello, World!&amp;quot; as a comment:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ff fe 00 10 48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 00
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The format of the data varies with the different segment types.
When implementing the decoder&lt;sup&gt;&lt;a href=&quot;#user-content-fn-ongoing-impl&quot; id=&quot;user-content-fnref-ongoing-impl&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;, I simply read parts of the JPEG specification, which is available &lt;a href=&quot;https://www.w3.org/Graphics/JPEG/itu-t81.pdf&quot;&gt;here&lt;/a&gt; (pdf).
See page 38 (marked as page 34) for an overview of the format.
The format of different markers follow for the next 13 pages.&lt;/p&gt;
&lt;p&gt;There is no need to go into too much detail just yet.
We can simply start with saying that an image consists of a &lt;em&gt;frame&lt;/em&gt;, which again consists of one or more &lt;em&gt;scans&lt;/em&gt;.
Each scan contains one or more &lt;em&gt;entropy-coded segments&lt;/em&gt; (ECS).
So far, the images I have tested have contained one frame, one scan, and one ECS,
which is a good starting point.
Complexity does not always have to be payed for up front.&lt;/p&gt;
&lt;h1&gt;The JPEG Part&lt;/h1&gt;
&lt;p&gt;Let&apos;s have a look at the image data --- this is after all the most exciting part.&lt;/p&gt;
&lt;h2&gt;Encoding from 10000 meters&lt;/h2&gt;
&lt;p&gt;In essence, this is how JPEG encoding works:&lt;sup&gt;&lt;a href=&quot;#user-content-fn-encoding-ignores&quot; id=&quot;user-content-fnref-encoding-ignores&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Split the image into 8x8 blocks. In case the image is not perfectly divided into 8x8 blocks, extend the borders of the image such that it is&lt;/li&gt;
&lt;li&gt;Convert the block to frequency domain, using the Discrete Cosine Transform&lt;/li&gt;
&lt;li&gt;Reorder the block to a &amp;quot;zigzag&amp;quot; ordering&lt;/li&gt;
&lt;li&gt;Quantize frequency coefficients&lt;/li&gt;
&lt;li&gt;Encode the block using huffman coding&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In order to understand why we are doing this, we need to take a closer look at frequency transforms, and huffman coding.&lt;/p&gt;
&lt;h2&gt;Discrete Cosine Transform&lt;/h2&gt;
&lt;p&gt;The Discrete Cosine Transform(DCT) creates a representation of a signal as a sum of cosines of different amplitude and frequency.
I do not feel I am able to explain this good enough, but you are encouraged to look at the
Wikipedia page for &lt;a href=&quot;https://en.wikipedia.org/wiki/Fourier_series&quot;&gt;Fourier Series&lt;/a&gt;,
which contains some great animation and images;
for our use, DCT is basically the same thing.
If this went over your head, do not worry. Understanding exactly &lt;em&gt;how&lt;/em&gt; it works is not as important as understanding &lt;em&gt;why&lt;/em&gt; we would like to use it.&lt;/p&gt;
&lt;p&gt;So what does DCT have to do with images?
Instead of looking at an image as a grid of pixels, we can interpret the image as a two dimentional signal.
For instance, say we have a grayscale image of size 8x1 px, and that the image data,
where each pixel is a number between &lt;code&gt;0&lt;/code&gt; (black) and &lt;code&gt;255&lt;/code&gt; (white), looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[0, 32, 64, 96, 128, 160, 192, 224]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The image looks like this (scaled by 3200%):&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;signal-image.jpeg&quot; alt=&quot;image-test&quot; /&gt;&lt;/p&gt;
&lt;p&gt;We can interpret this image as mathematical function;
in this case it is rather easy: $f(x,y) = 32x$.
So how does the DCT of this signal look? Like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[896.0, -583.1, 0.0, -61.0, 0.0, -18.2, 0.0, -4.6]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we go backwards, using the &lt;em&gt;inverse&lt;/em&gt; DCT, we get the exact same image data as the ones we fed into the DCT; no information is lost.
This does not look too impressive; sure --- we got some zeroes here and there, but there
seems to still be some data which needs to be saved.
What if we increase our image to be 8x8, instead of 8x1?&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;signal-image-large.jpeg&quot; alt=&quot;image-test&quot; /&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Signal
   0.0   32.0   64.0   96.0  128.0  160.0  192.0  224.0
   0.0   32.0   64.0   96.0  128.0  160.0  192.0  224.0
   0.0   32.0   64.0   96.0  128.0  160.0  192.0  224.0
   0.0   32.0   64.0   96.0  128.0  160.0  192.0  224.0
   0.0   32.0   64.0   96.0  128.0  160.0  192.0  224.0
   0.0   32.0   64.0   96.0  128.0  160.0  192.0  224.0
   0.0   32.0   64.0   96.0  128.0  160.0  192.0  224.0
   0.0   32.0   64.0   96.0  128.0  160.0  192.0  224.0

After DCT
 896.0 -583.1    0.0  -61.0    0.0  -18.2    0.0   -4.6
  -0.0    0.0    0.0   -0.0    0.0    0.0   -0.0    0.0
   0.0   -0.0   -0.0   -0.0    0.0    0.0    0.0    0.0
  -0.0   -0.0   -0.0   -0.0    0.0    0.0    0.0    0.0
   0.0   -0.0   -0.0    0.0    0.0    0.0    0.0    0.0
  -0.0    0.0   -0.0    0.0    0.0    0.0   -0.0    0.0
   0.0   -0.0   -0.0   -0.0   -0.0   -0.0    0.0    0.0
  -0.0    0.0   -0.0   -0.0    0.0    0.0    0.0   -0.0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now it is clear that for some images --- or image blocks --- there are great opportunities to minimize the size of the encoded data. There are even more tricks to this, such as quantization, which we will have a look at later.&lt;/p&gt;
&lt;h2&gt;Quantization&lt;/h2&gt;
&lt;p&gt;Quantization is the only lossy step in our 5 steps, and so it is this step that controls the level
of compression.
Quantization is the process of mapping a set of values to a smaller set,
and it is done with a quantization matrix, which is the same size as an image block: 8x8.
Predefined matrices are suggested in the JPEG standard&lt;sup&gt;&lt;a href=&quot;#user-content-fn-quantization-suggestions&quot; id=&quot;user-content-fnref-quantization-suggestions&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;, but each image encodes its own quantization matrices.&lt;/p&gt;
&lt;p&gt;Quantization is applied after DCT, and is used to reduce the information of
the coefficients in every image block.
By making almost equal numbers equal, we decrease the number of unique numbers,
and increase the compression ratio enabled by huffman conding.&lt;/p&gt;
&lt;p&gt;Let us predend blocks are 2x2 instead of 8x8.
Say the data we got back from the DTC is&lt;/p&gt;
&lt;p&gt;$$G =
\begin{bmatrix}
230 &amp;amp; 68 \\
99 &amp;amp; 72
\end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;A quantization matrix could be&lt;/p&gt;
&lt;p&gt;$$Q =
\begin{bmatrix}
10 &amp;amp; 11 \\
12 &amp;amp; 13
\end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;We take each component of $G$ and divide it by its corresponding component of $Q$, and round the numbers to the nearest integer:&lt;/p&gt;
&lt;p&gt;$$B =
\begin{bmatrix}
\frac{230}{10} &amp;amp; \frac{68}{11}\\
\frac{99}{12} &amp;amp; \frac{72}{13}
\end{bmatrix} =
\begin{bmatrix}
23 &amp;amp; 6 \\
8 &amp;amp; 6
\end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;And we are done. The data from $B$ is what is passed down to the next step.&lt;/p&gt;
&lt;p&gt;We can also see that if we are going the other way, taking the inner product, with the same quantization matrix, we get&lt;/p&gt;
&lt;p&gt;$$G&apos; =
\begin{bmatrix}
230 &amp;amp; 66 \\
96 &amp;amp; 78
\end{bmatrix}
$$&lt;/p&gt;
&lt;h2&gt;Huffman Coding&lt;/h2&gt;
&lt;p&gt;So we found a way to represent our image block with a lot of similar numbers --- &lt;code&gt;0&lt;/code&gt; in our case.
How can we take advantage of this? If we use 32 bit integers, the size of the number is 32 bits, no matter the number! Or is it?&lt;/p&gt;
&lt;p&gt;Huffman Coding is a coding scheme used for lossless coding.
The gist of the scheme is to code numbers as bit strings, with the property
that no code can be the prefix of another code (if &lt;code&gt;01&lt;/code&gt; is a code, &lt;code&gt;011&lt;/code&gt; can not be a code),
and that frequent numbers should have a shorter code than less frequent numbers.&lt;/p&gt;
&lt;p&gt;Say we want to encode this (totally random) list of bytes&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[2, 7, 1, 8, 2, 8, 1, 8, 2, 8]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can count the number of occurences of each number in the list,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Number | Occurences
-------------------
   1   |      2
   2   |      3
   7   |      1
   8   |      4
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and create the codes&lt;sup&gt;&lt;a href=&quot;#user-content-fn-how-to-huffman-code&quot; id=&quot;user-content-fnref-how-to-huffman-code&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Number | Code
-------------------
   8   |  0
   2   |  10
   1   |  111
   7   |  110
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;making our data&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// data
 2   7   1 8  2 8   1 8  2 8
// coded data
10 110 111 0 10 0 111 0 10 0
// squash together
1011011101001110100
// look at each byte
10110111 01001110 100?????
// pad with 1
10110111 01001110 10011111
// which is the same as
183 78 159
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Effectively, we coded 10 bytes as 3 bytes&lt;sup&gt;&lt;a href=&quot;#user-content-fn-huffman-end&quot; id=&quot;user-content-fnref-huffman-end&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; &lt;sup&gt;&lt;a href=&quot;#user-content-fn-huffman-prefix&quot; id=&quot;user-content-fnref-huffman-prefix&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;11&lt;/a&gt;&lt;/sup&gt;,
by taking advanage of the fact that some numbers are more frequent than others.&lt;/p&gt;
&lt;h2&gt;Back to the 10000 meter view&lt;/h2&gt;
&lt;p&gt;Now that we have some control over roughly what happens, we can take another look at the whole procedure, with some additional steps.&lt;/p&gt;
&lt;p&gt;So, we split the image into blocks, and each block is more or less processed by itself.
This is done because of the principle of locality: pixels within a block are likely to be somewhat similar.&lt;/p&gt;
&lt;p&gt;Next, we transform the image into frequency domain, so we can make it easier to encode the numbers.
Then, we reorder the blocks, with the goal of getting long runs of zeroes.
We also take the inner division (element-by-element division) of our frequencies and a quantization matrix, in order to make the coefficients somewhat similar.
This is the lossy part.
At last, we use huffman coding to write out the image data, taking advantage
of the fact that we are writing lots of similiar numbers.&lt;/p&gt;
&lt;p&gt;And that is pretty much it --- from a 10000 meters view.&lt;/p&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;We have looked at how a JFIF file looks like, and roughly how the image data is encoded.
In Part 2 we will start implementing this, in actual, runnable, (hopefully) working, Rust code.&lt;/p&gt;
&lt;p&gt;If you found some things confusing, or simply want better explanations than I can give, check of the &lt;a href=&quot;https://en.wikipedia.org/wiki/JPEG&quot;&gt;Wiki page for JPEG&lt;/a&gt;;
it explains the encoding process using a 8x8 sample block.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.reddit.com/r/programming/comments/4w9z62/writing_a_jpeg_decoder_in_rust_part_1_background/&quot;&gt;/r/programming thread&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.reddit.com/r/rust/comments/4wau7o/writing_a_jpeg_decoder_in_rust_part_1_background/&quot;&gt;/r/rust thread&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Read part 2 &lt;a href=&quot;../jpeg-rust-2&quot;&gt;here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h3&gt;Errata&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Quantization was listed before zigzagging the data. This was the wrong way around.&lt;/li&gt;
&lt;/ul&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-project-time&quot;&gt;
&lt;p&gt;Although I have not worked on the project every day --- in fact, as of this writing, there are commits from only 14 distinct days: &lt;code&gt;git log --format=&amp;quot;%cd&amp;quot; --date=short | uniq | wc -l&lt;/code&gt;. &lt;a href=&quot;#user-content-fnref-project-time&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-pure-jpeg&quot;&gt;
&lt;p&gt;As &lt;a href=&quot;https://www.reddit.com/r/programming/comments/4w9z62/writing_a_jpeg_decoder_in_rust_part_1_background/d66nzug&quot;&gt;/u/AlyoshaV&lt;/a&gt; points out, pure JPEG files do exist, but are of limited use, because decoders have to guess how to decode it. I stand corrected! &lt;a href=&quot;#user-content-fnref-pure-jpeg&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-jpeg-jfif&quot;&gt;
&lt;p&gt;Wikipedia writes JPEG/JFIF, but the J in JFIF stands for JPEG. Not sure what to make of this, so I will call it JFIF. &lt;a href=&quot;#user-content-fnref-jpeg-jfif&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-wiki-copy&quot;&gt;
&lt;p&gt;Ok, that sentence was nearly copied from Wikipeida. No matter --- here is &lt;a href=&quot;http://httparchive.org/interesting.php#imageformats&quot;&gt;the source&lt;/a&gt; &lt;a href=&quot;#user-content-fnref-wiki-copy&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-please-dont&quot;&gt;
&lt;p&gt;Please don&apos;t. &lt;a href=&quot;#user-content-fnref-please-dont&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-ongoing-impl&quot;&gt;
&lt;p&gt;Which is still an ongoing process, mind you! &lt;a href=&quot;#user-content-fnref-ongoing-impl&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-encoding-ignores&quot;&gt;
&lt;p&gt;We are taking a little shortcut here, ignoring things as multiple channels, color conversion and chroma-subsampling. &lt;a href=&quot;#user-content-fnref-encoding-ignores&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-quantization-suggestions&quot;&gt;
&lt;p&gt;See Annex K in the JPEG specification for two suggested quantization matrices. &lt;a href=&quot;#user-content-fnref-quantization-suggestions&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-how-to-huffman-code&quot;&gt;
&lt;p&gt;The algorithm of devising the bit strings can be found on the &lt;a href=&quot;https://en.wikipedia.org/wiki/Huffman_coding#Basic_technique&quot;&gt;Wiki page&lt;/a&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-huffman-different&quot; id=&quot;user-content-fnref-huffman-different&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;. Check it out! &lt;a href=&quot;#user-content-fnref-how-to-huffman-code&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-huffman-end&quot;&gt;
&lt;p&gt;Note that this introduces a little bit of a challenge: how do you know when you are done reading? Two possible solutions are to either know ahead of time how many elements we are to read, or we could encode a special byte, such as &lt;code&gt;0xff&lt;/code&gt; or &lt;code&gt;0x00&lt;/code&gt; to mark &lt;code&gt;end-of-data&lt;/code&gt;. &lt;a href=&quot;#user-content-fnref-huffman-end&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-huffman-prefix&quot;&gt;
&lt;p&gt;Also note how it is posisble to decode the bit stream, since no code is a prefix of another code; we can simply check is &lt;code&gt;1&lt;/code&gt; a code? No. Is &lt;code&gt;10&lt;/code&gt; a code? Yes! Got 2. Now, the bit stream is &lt;code&gt;1101110100111010011111&lt;/code&gt;, and we can start again. Is &lt;code&gt;1&lt;/code&gt; a code? No. Is &lt;code&gt;11&lt;/code&gt; a code? No. etc. Alternatively, one can somehow know the length of the next code. &lt;a href=&quot;#user-content-fnref-huffman-prefix&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-huffman-different&quot;&gt;
&lt;p&gt;I used online generators to get the code listed. It is worth noting that different generators gave different codes, so either the generators I found are not, strictly speaking, correct, or there are some ambiguity here. &lt;a href=&quot;#user-content-fnref-huffman-different&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Expanding TeX&apos;s \newif</title><id>https://mht.wtf/post/tex/</id><updated>2021-06-19T16:29:02+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/tex/" rel=""/><link href="https://mht.wtf/post/tex/index.html" rel="alternate"/><published>2021-06-19T16:29:02+02:00</published><content type="text/html">&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Like most of my colleagues, I use LaTeX to write papers, reports, notes, or what have you.
In fact, I think all of the places that I regularly write supports some variable subset of LaTeX.
Also like most of my colleagues, I&apos;m not a TeXnician.
I&apos;m not proud to be ignorant in this regard, but there&apos;s only so many hours in a day, and
the gains from properly learning a huge ecosystem like LaTeX seems minuscule compared
to the initial buy-in cost.&lt;/p&gt;
&lt;p&gt;Still, I was curious.&lt;/p&gt;
&lt;p&gt;LaTeX and TeX, tomato tomato?
Here&apos;s how I see it.
If LaTeX is like C++20 --- big, complex, confusing, full of cruft, but still very popular ---
then TeX is like C89 --- small, simpler&lt;sup&gt;&lt;a href=&quot;#user-content-fn-simple&quot; id=&quot;user-content-fnref-simple&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, confusing, a child of its time, and often neglected.&lt;/p&gt;
&lt;p&gt;There&apos;s a certain pleasure in going far enough down the stack that the systems you are using becomes simple enough to reason about on a deep level.
It&apos;s the feeling you might get sitting down one afternoon trying to write some assembly after
a long week of debugging consistency errors in your sharded database across multiple kubernetes clusters&lt;sup&gt;&lt;a href=&quot;#user-content-fn-kube&quot; id=&quot;user-content-fnref-kube&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.
No magic, no need to constantly search for other people who&apos;s had the same problems you&apos;re dealing with on StackOverflow.
It&apos;s just you and the CPU, and likely the Intel Instruction Set Manual or something as big and scary.
I wanted that, but with typesetting.&lt;/p&gt;
&lt;p&gt;This was my romantic motivation to dig into TeX and try to see whether it really is rewarding to
step back a few decades to avoid the complexity of newer and bigger typesetting systems.
I bought the TeXbook, and read it from start to finish.
Well, some paragraphs are marked with &amp;quot;dangerous bends&amp;quot;, signalling that the content covered or the background assumed
for those paragraphs are more advanced. I read the single bends, but skipped the double bends, at least most of the time.&lt;/p&gt;
&lt;p&gt;Somewhere in the book I found the definition of &lt;code&gt;\newif&lt;/code&gt;, a macro that&apos;s used to define conditionals,
which you can later query, and branch on. Booleans, in other words.
I read it, and really didn&apos;t understand a single thing,
and I figured that if I can manage to sit down and figure out what on earth this macro is doing and why, then
I&apos;ve had a good taste of what it&apos;s like digging down this low in the world of TeX.&lt;/p&gt;
&lt;p&gt;This post is the result of that process.&lt;/p&gt;
&lt;h3&gt;How Do I Write TeX?&lt;/h3&gt;
&lt;p&gt;This is not really as obvious as it might sound. After all, TeX produces a document, but when playing with macros
we really want to see what forms expand to, which macros are defined, and so on.
I have to say upfront that the method I used here probably wasn&apos;t the ideal,
because I just started used &lt;code&gt;tex&lt;/code&gt; (or sometimes &lt;code&gt;pdftex&lt;/code&gt;, for the purposes of this post they seem to be exactly the same),
and started writing. The repl doesn&apos;t support &lt;code&gt;readline&lt;/code&gt; bindings or arrow keys, or clicking to move the cursor,
so if I wanted to add something in the middle of a line, I had to hold backspace all the way back to where I wanted to go
and write out the rest of the expression. Sometimes I pasted back and forth from a text editor, which worked okay.&lt;/p&gt;
&lt;p&gt;Here&apos;s exactly how I got started&lt;sup&gt;&lt;a href=&quot;#user-content-fn-arch&quot; id=&quot;user-content-fnref-arch&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;/h/martin$ tex
This is TeX, Version 3.141592653 (TeX Live 2021/Arch Linux) (preloaded format=tex)
**\relax  % don&apos;t read input from a file

*\tracingall=1                 % Give us lots of output
{vertical mode: \tracingstats}
{\tracingpages}
{\tracingoutput}
{\tracinglostchars}
{\tracingmacros}
{\tracingparagraphs}
{\tracingrestores}
{\showboxbreadth}
{\showboxdepth}
{the character =}
{horizontal mode: the character =}
{blank space  }

*\message{This will show somewhere}    % some sample message
{\message}
This will show somewhere               % here&apos;s the things you wrote above
{blank space  }

*\def\mymacro{from the macro}  % Make a new macro
{\def}
{blank space  }

*\message{\mymacro}         % \message will expand the macro
{\message}

\mymacro -&amp;gt;from the macro   % \mymacro is expanded to `from the macro`
from the macro              % ... and we get the fully expanded form out.
{blank space  }

*
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Input lines start with a &lt;code&gt;*&lt;/code&gt;.
It&apos;s very useful to set &lt;code&gt;\tracingall=1&lt;/code&gt;, which makes TeX output a bunch of things some of which you care about.
Note that I&apos;ve changed up the formatting of the output throughout this post so that it&apos;s easier to see what&apos;s going on.&lt;/p&gt;
&lt;p&gt;Another quick note: I didn&apos;t want to spend hours write an intro to TeX as well as whatever this is, so
if you have never written a line of TeX or LaTeX, this might be difficult to follow. If you&apos;ve written
some LaTeX, and maybe defined your own simple macros, I think you&apos;ll be fine.&lt;/p&gt;
&lt;h2&gt;The Goal&lt;/h2&gt;
&lt;p&gt;This is the definition we&apos;ll unravel, copied verbatim from The TeXbook.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;\outer\def\newif#1{\count@=\escapechar \escapechar=-1
  \expandafter\expandafter\expandafter
   \def\@if#1{true}{\let#1=\iftrue}%
  \expandafter\expandafter\expandafter
   \def\@if#1{false}{\let#1=\iffalse}%
  \@if#1{false}\escapechar=\count@} % the condition starts out false
\def\@if#1#2{\csname\expandafter\if@\string#1#2\endcsname}
{\uccode`1=`i \uccode`2=`f \uppercase{\gdef\if@12{}}} % `if` is required
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Don&apos;t despair if this is nonsense:
the whole point of this post is to explain what&apos;s going on, and to get some
better idea of how real and (somewhat) involved TeX macros work.&lt;/p&gt;
&lt;h2&gt;How TeX Reads Tokens&lt;/h2&gt;
&lt;p&gt;To start on the right foot, let&apos;s make sure that we properly understand how TeX reads tokens.
A token is the input &amp;quot;unit&amp;quot; that TeX reads when it reads a document.
For instance if you were to write &lt;code&gt;Let $n=\numb$ be a number.&lt;/code&gt; then this will be transformed into a queue of tokens from which
we will read one at a time. Exactly how the tokens are split up is not crucial to understanding, but in this example
it looks something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;tokens = [&apos;L&apos;, &apos;e&apos;, &apos;t&apos;, &apos; &apos;, $, &apos;n&apos;, &apos;=&apos;, \numb, $, ...]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Notice three things.
First, a letter is a token in of itself and we do not have one &amp;quot;word&amp;quot; be a token
Second, &lt;code&gt;$&lt;/code&gt; is not the character &lt;code&gt;&apos;$&apos;&lt;/code&gt;, but the special begin/end math mode token.
If we were to write &lt;code&gt;\$&lt;/code&gt; we would get the character token &lt;code&gt;&apos;$&apos;&lt;/code&gt;.
Third, the whole macro &lt;code&gt;\numb&lt;/code&gt; is one single token.
When you hear &amp;quot;token&amp;quot;, think &amp;quot;input unit&amp;quot;.&lt;/p&gt;
&lt;p&gt;So how does TeX read the tokens? One mental model is like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;while tokens is not empty
    t &amp;lt;- pop(tokens)
    if shouldexpand(t)
        exp &amp;lt;- expand(t)
        tokens.push(ex)
    else
        process(t)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Some tokens, like the &lt;code&gt;\newif&lt;/code&gt; token we will figure out in this post, expand,
and the expansion is another list of tokens, some of which might be regular character tokens,
and some of which might be other tokens that also expand. Therefore when we expand a token
we will push the result back onto the front of the queue.&lt;/p&gt;
&lt;p&gt;Note that when we expand a macro that takes arguments, like &lt;code&gt;\def\paren#1{(#1)}&lt;/code&gt; the expansion of &lt;code&gt;\paren&lt;/code&gt; will
pop more tokens from the queue, and then push the tokens of the expanded form back onto the queue.&lt;/p&gt;
&lt;p&gt;What does it mean to &amp;quot;process&amp;quot; a token? For a character, this basically means to write that character
at the current position on the page&lt;sup&gt;&lt;a href=&quot;#user-content-fn-writechar&quot; id=&quot;user-content-fnref-writechar&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.
For a macro definition like &lt;code&gt;\def\bob{123}&lt;/code&gt; it means to make the definition and storing it somewhere in memory
so that if you ever encounter a &lt;code&gt;\bob&lt;/code&gt; token you know that it expands to the three tokens &lt;code&gt;1&lt;/code&gt;,&lt;code&gt;2&lt;/code&gt;,&lt;code&gt;3&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;A Short Example&lt;/h3&gt;
&lt;p&gt;Let &lt;code&gt;\def\A{a}  \def\B{\A b}  \def\C{\B\B}&lt;/code&gt; and the input token queue be &lt;code&gt;[\C]&lt;/code&gt;.
To make sure we understand how this works, let&apos;s manually expand this whole thing.
The left column is the token queue, and the left side of the queue is the front, which is the place at which
we will be working.
The right column explains what we&apos;re about to do.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;Tokens&lt;/th&gt;
&lt;th align=&quot;left&quot;&gt;Current action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;/table&gt;
&lt;p&gt;&lt;code&gt;[\C]&lt;/code&gt;   | take &lt;code&gt;\C&lt;/code&gt; out of the front of the queue
&lt;code&gt;[]&lt;/code&gt;     | &lt;code&gt;\C&lt;/code&gt; expands to &lt;code&gt;\B\B&lt;/code&gt;, which we push back
&lt;code&gt;[\B  \B]&lt;/code&gt; | take the first &lt;code&gt;\B&lt;/code&gt; out
&lt;code&gt;[\B]&lt;/code&gt;   | &lt;code&gt;\B&lt;/code&gt; expands to &lt;code&gt;\A b&lt;/code&gt;
&lt;code&gt;[\A   b  \B]&lt;/code&gt; | &lt;code&gt;\A&lt;/code&gt; is taken out, expanded to &lt;code&gt;a&lt;/code&gt; and pushed back
&lt;code&gt;[ a   b  \B]&lt;/code&gt; | &lt;code&gt;a&lt;/code&gt; is taken out and processed, because it doesn&apos;t expand.
&lt;code&gt;[ b  \B]&lt;/code&gt; | &lt;code&gt;b&lt;/code&gt; is taken out and processed.
&lt;code&gt;[\B]&lt;/code&gt; | you get the idea...
&lt;code&gt;[\A   b]&lt;/code&gt; |
&lt;code&gt;[ a   b]&lt;/code&gt; |
&lt;code&gt;[ b]&lt;/code&gt; |
&lt;code&gt;[]&lt;/code&gt; |&lt;/p&gt;
&lt;p&gt;The end result of this execution is that we have sent the tokens &lt;code&gt;a&lt;/code&gt;, &lt;code&gt;b&lt;/code&gt;, &lt;code&gt;a&lt;/code&gt;, &lt;code&gt;b&lt;/code&gt; to the processing part of TeX.&lt;/p&gt;
&lt;h2&gt;A Primer on Catcodes&lt;/h2&gt;
&lt;p&gt;We need to know one more thing about tokens, or rather how the characters of your input are split into them.
Each character have a &lt;em&gt;category code&lt;/em&gt;, or catcode for short. Catcodes decide how to group and split characters into
a token. There is a character code for letters (11), a code for space (10), and one for math shift (3) (there are also others).
This way TeX knows that in the input &lt;code&gt;let $&lt;/code&gt; consists of three characters, one space, and one &amp;quot;math shift&amp;quot;.
This is also how TeX figures out when the name of a macro ends and new tokens begin, as in &lt;code&gt;\hey3&lt;/code&gt;:
here we have one token with catcode 0 (the escape character &lt;code&gt;\&lt;/code&gt;), three of catcode 11, and one of catcode 12 (&amp;quot;others&amp;quot;, which include numbers).
The name of a macro is only letters, so this way TeX knows that &lt;code&gt;\hey&lt;/code&gt; is a macro and &lt;code&gt;3&lt;/code&gt; is just the next token in the queue.&lt;/p&gt;
&lt;p&gt;But catcodes can be changed. Why is this useful? Well, if we would like to make some macros that another user wouldn&apos;t accidentally
redefine we have it include a character that, by default, isn&apos;t allowed to be in its name, like &lt;code&gt;@&lt;/code&gt;.
The catcode of &lt;code&gt;@&lt;/code&gt; is 12, and so the input &lt;code&gt;\h@&lt;/code&gt; will be read as two tokens &lt;code&gt;\h&lt;/code&gt; and &lt;code&gt;&apos;@&apos;&lt;/code&gt;. However, if we change the catcode of &lt;code&gt;@&lt;/code&gt; to 11
it&apos;s as if &lt;code&gt;@&lt;/code&gt; is just a regular letter, and &lt;code&gt;\h@&lt;/code&gt; will be read as a single token &lt;code&gt;\h@&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This is how we change the catcode of &lt;code&gt;@&lt;/code&gt; to 11 and then back to 12:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\catcode`\@=11  % Category 11 consists of regular letters
*\catcode`\@=12  % Category 12 consists of &amp;quot;other characters&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Some Not So Bad Macros&lt;/h2&gt;
&lt;p&gt;We need to know about a few other macros that &lt;code&gt;\newif&lt;/code&gt; uses internally. Most of these are
pretty straight forward.&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;\string&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Takes an argument and replaces it by the non-expanded token list.
&lt;code&gt;\string\foo&lt;/code&gt; expands to the four tokens &lt;code&gt;\ f o o&lt;/code&gt;, no matter what the macro &lt;code&gt;\foo&lt;/code&gt; would expand to.
A crucial detail which we will come back to is that the tokens &lt;code&gt;\string&lt;/code&gt; produces will get catcode 12 (unless it&apos;s a space).&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;\escapechar&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The character which is used when a control sequence is outputted as text. Normally set to &lt;code&gt;\&lt;/code&gt;.
If this is set to for instance &lt;code&gt;@&lt;/code&gt;, then &lt;code&gt;\string\foo&lt;/code&gt; would expand to the four tokens &lt;code&gt;@ f o o&lt;/code&gt; instead.&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;\uccode&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Short for uppercase code. This allows one to set the uppercase character code for another letter.
Usually this would be &lt;code&gt;\uccode`x=`X  \uccode`X=`X&lt;/code&gt; and so on, but this, like most things in TeX, can be changed,
and changes, like most things in TeX, are local to the current group.&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;\csname&lt;/code&gt; and &lt;code&gt;\endcsname&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Read and expand everything up until the matching &lt;code&gt;\endcsname&lt;/code&gt;.
The expansion result should be a list of character tokens,
and this list will be made into a single control sequence token.
If this is currently not defined it will be defined to &lt;code&gt;\relax&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For instance &lt;code&gt;\csname hello\endcsname&lt;/code&gt; will expand to the single token &lt;code&gt;\hello&lt;/code&gt; and make the macro &lt;code&gt;\hello&lt;/code&gt; expand to &lt;code&gt;\relax&lt;/code&gt;.
More interestingly, &lt;code&gt;\def\inner{hello}\csname\inner\endcsname&lt;/code&gt; will do the same:
Here the &lt;code&gt;inner&lt;/code&gt; macro expands to the list of tokens &lt;code&gt;h e l l o&lt;/code&gt;, and the &lt;code&gt;csname&lt;/code&gt; pair of macros
expand this macro, effectively replacing it with &lt;code&gt;\csname hello\endcsname&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;\gdef&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Normally definitions made with &lt;code&gt;\def&lt;/code&gt; are local to your scope, just like in most programming languages.
However, sometimes we want to define global macros, and &lt;code&gt;gdef&lt;/code&gt; does exactly this.
When a macro is defined with &lt;code&gt;\gdef&lt;/code&gt; it is as if it was defined in the top level scope.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;{ 
    \def\inner{hello}
    \inner  % expands to  h e l l o
}
\inner   % this doesn&apos;t work, because \inner is no longer defined

{
    \gdef\inner{hello}
    \inner  % expands to  h e l l o
}
\inner % also expands to  h e l l o
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;code&gt;\outer&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;This is a safety measure that you put before a &lt;code&gt;\def&lt;/code&gt; which ensures that this macro
is not allowed to be an argument, in the parameter text, or in the replacement text of another macro.&lt;/p&gt;
&lt;h2&gt;The &lt;code&gt;\expandafter&lt;/code&gt; Macro&lt;/h2&gt;
&lt;p&gt;Now that we&apos;ve seen a few simple macros we turn to one that is slightly less simple.
The &lt;code&gt;\expandafter&lt;/code&gt; macro first reads the very next token in the queue without expanding it.
Then, it&apos;ll read &lt;em&gt;and expand&lt;/em&gt; the next token after that.
Last, it will put the first token back in front, without expanding it.
Here&apos;s a small example of how it runs:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\def\first{first}
*\def\second{second}
*\expandafter\first\second
{\expandafter}

\second -&amp;gt;SECOND

\first -&amp;gt;FIRST
{the letter F}
*
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here the output shows that &lt;code&gt;\second&lt;/code&gt; is expanded before &lt;code&gt;\first&lt;/code&gt;, and that the first token that we process is &lt;code&gt;f&lt;/code&gt;.
Note that the second form is only &lt;em&gt;expanded&lt;/em&gt; and not actually processed, so the following
does &lt;strong&gt;not&lt;/strong&gt; work:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\expandafter\first\def\first{another first!}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The second term, the &lt;code&gt;\def&lt;/code&gt; will be expanded, but it will not &amp;quot;run&amp;quot;, so when
&lt;code&gt;\expandafter&lt;/code&gt; later expands &lt;code&gt;\first&lt;/code&gt; it will still have the same value as before,
for instance not to be defined.&lt;/p&gt;
&lt;p&gt;Due to how TeX expansion rules work, a macro doesn&apos;t have to have all of
it&apos;s arguments in place when you use it; currying&lt;sup&gt;&lt;a href=&quot;#user-content-fn-curry&quot; id=&quot;user-content-fnref-curry&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; is in a sense possible.
We can use &lt;code&gt;\expandafter&lt;/code&gt; to use this fact if the first token expands to a
curried macro, and the first token in the &lt;em&gt;expansion&lt;/em&gt; of the second token is
the argument we want to give to the curried form.&lt;/p&gt;
&lt;p&gt;Here&apos;s an example. Say we have a macro &lt;code&gt;\twoarray&lt;/code&gt; that takes two things and wraps them in square
brackets divided by a comma, as well as a macro &lt;code&gt;\tuple&lt;/code&gt; that expands to two tokens &lt;code&gt;4&lt;/code&gt; and &lt;code&gt;5&lt;/code&gt;.
If we want to have &lt;code&gt;\twoarray&lt;/code&gt; wrap the two tokens from &lt;code&gt;\tuple&lt;/code&gt;, it doesn&apos;t work out of the box:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\def\twoarray#1#2{[ #1 , #2 ]}
*\def\tuple{4 5}
*\twoarray\tuple X  % X is just a placeholder for whatever&apos;s next; we don&apos;t want it.
[ 4 5 , X ]
% This does not work because `\twoarray` will read two tokens, `\tuple` and `X`

*\expandafter\twoarray\tuple X
[ 4 , 5 ] X
% This does work because `\tuple` is expanded before `\twoarray`, and so the token
% queue when we process `\twoarray` is  `4 5 X`
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Chaining&lt;/h3&gt;
&lt;p&gt;So what happens when we chain multiple &lt;code&gt;\expandafter&lt;/code&gt;s together?
Let&apos;s work it out with some notation:
dashes under a line means &lt;code&gt;\expandafter&lt;/code&gt; is skipping that line,
and it&apos;s expanding the token above the hat &lt;code&gt;^&lt;/code&gt;.
Primed &lt;code&gt;a&apos;&lt;/code&gt; letters means expanded.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\expandafter a  b  c  d ...
%             -  ^
% token list: a  b&apos; c  d
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With two &lt;code&gt;\expandafter&lt;/code&gt;s this becomes&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\expandafter \expandafter a  b  c  d ...
%             ------------ ^
% token list:  \expandafter a&apos; b  c  d
*\expandafter  a&apos; b  c  d ...
%              -  ^
% token list:  a&apos; b&apos; c  d
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It undid itself! The expansion order was &lt;code&gt;a&lt;/code&gt; and then &lt;code&gt;b&lt;/code&gt;.
Let&apos;s try three expands in a row. Now we&apos;re getting somewhere, because when expanding the second token that &lt;code&gt;\expandafter&lt;/code&gt; finds,
we might end up reading &lt;em&gt;additional&lt;/em&gt; tokens, &lt;em&gt;if&lt;/em&gt; that token takes arguments. In this
case this token is &lt;code&gt;\expandafter&lt;/code&gt;, which does indeed take two arguments!&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\expandafter \expandafter \expandafter a  b  c  d ...
%             ------------     ^^^
%                          [eat 2 arguments]
*             \expandafter       a  b&apos;        c  d ...
% This is just the first example again.
% token list:  a  b&apos;&apos; c  d ...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and we&apos;re again back to having the expansion order of &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt; flipped.
Despite this though, they are not identical, because &lt;code&gt;expandafter&lt;/code&gt; does not expand a form until it only expands to itself, but only once.
We can think of regular expansion as taking out the next token in the queue
and if it is expandable we push back the expansion onto the queue.&lt;/p&gt;
&lt;p&gt;Let&apos;s get concrete.
As a warm up, here is the easy case where the two forms &lt;em&gt;are&lt;/em&gt; identical, namely when expanding once is fully expanded.
The list of &lt;code&gt;\A -&amp;gt;a&lt;/code&gt; beneath each input line is the evaluation sequence such that the macro &lt;code&gt;\A&lt;/code&gt; expands to the token &lt;code&gt;a&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\def\A{a}\def\B{b}\def\C{c}

*\A\B\C
\A -&amp;gt;a   \B -&amp;gt;b   \C -&amp;gt;c   
*\expandafter\A\B\C
\B -&amp;gt;b   \A -&amp;gt;a   \C -&amp;gt;c   
*\expandafter\expandafter\A\B\C
\A -&amp;gt;a   \B -&amp;gt;b   \C -&amp;gt;c   
*\expandafter\expandafter\expandafter\A\B\C
\B -&amp;gt;b   \A -&amp;gt;a   \C -&amp;gt;c   
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that just like we said above, the first and third lines are the same, and the second and fourth are the same.&lt;/p&gt;
&lt;p&gt;Next we make it slightly more interesting by expanding macros which body is another macro:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\def\AA{\A}\def\BB{\B}\def\CC{\C}

*\AA\BB\CC
\AA -&amp;gt;\A   \A -&amp;gt;a     \BB -&amp;gt;\B    \B -&amp;gt;b   \CC -&amp;gt;\C   \C -&amp;gt;c   
*\expandafter\AA\BB\CC
\BB -&amp;gt;\B   \AA -&amp;gt;\A   \A -&amp;gt;a      \B -&amp;gt;b   \CC -&amp;gt;\C   \C -&amp;gt;c   
*\expandafter\expandafter\AA\BB\CC
\AA -&amp;gt;\A   \BB -&amp;gt;\B   \A -&amp;gt;a      \B -&amp;gt;b   \CC -&amp;gt;\C   \C -&amp;gt;c   
*\expandafter\expandafter\expandafter\AA\BB\CC
\BB -&amp;gt;\B   \B -&amp;gt;b     \AA -&amp;gt;\A    \A -&amp;gt;a   \CC -&amp;gt;\C   \C -&amp;gt;c   
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The four lines have all distinct orders on which macros are expanded when, in contrast with the last example.
With four &lt;code&gt;expandafter&lt;/code&gt;s we are back to as if we had none.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;What if we had &lt;code&gt;\AAA&lt;/code&gt; and friends?&lt;/summary&gt;
&lt;p&gt;The TeX tracing output is getting pretty big, so I&apos;ve compressed it down to the following table,
where the left column is the number of &lt;code&gt;\expandafter&lt;/code&gt;s before &lt;code&gt;\AAA\BBB\CCC&lt;/code&gt;,
and each row is the order in which macros were expanded.
For instance, in the first row we first expanded &lt;code&gt;\AAA&lt;/code&gt;, then &lt;code&gt;\AA&lt;/code&gt;, then &lt;code&gt;\A&lt;/code&gt; and so on.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0     AAA     AA      A    BBB     BB      B    CCC     CC      C
1     BBB    AAA     AA      A     BB      B    CCC     CC      C 
2     AAA    BBB     AA      A     BB      B    CCC     CC      C 
3     BBB     BB    AAA     AA      A      B    CCC     CC      C 
4     AAA     AA    BBB      A     BB      B    CCC     CC      C
5     BBB    AAA     BB     AA      A      B    CCC     CC      C 
6     AAA    BBB     BB     AA      A      B    CCC     CC      C 
7     BBB     BB      B    AAA     AA      A    CCC     CC      C 
8     AAA     AA      A    BBB     BB      B    CCC     CC      C 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After 8 of them we are back to where we started. Also note that the &lt;code&gt;CCC&lt;/code&gt;s never change.&lt;/p&gt;
&lt;/details&gt;
&lt;p&gt;&lt;code&gt;\meaning\noexpand\foo&lt;/code&gt;&lt;/p&gt;
&lt;h2&gt;Start Actually Expanding &lt;code&gt;\newif&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;If you&apos;ve made it this far, good job! I realize this is a fair amount of prerequisites before
getting to the point of the post.&lt;/p&gt;
&lt;p&gt;Here&apos;s the definition of &lt;code&gt;\newif&lt;/code&gt; again, but formatted a little differently:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;\outer\def\newif#1{
    \count@=\escapechar
    \escapechar=-1
    \expandafter\expandafter\expandafter \def\@if#1{true}{\let#1=\iftrue}%
    \expandafter\expandafter\expandafter \def\@if#1{false}{\let#1=\iffalse}%
    \@if#1{false} % the condition starts out false
    \escapechar=\count@
}
\def\@if#1#2{\csname\expandafter\if@\string#1#2\endcsname}
{
    \uccode`1=`i
    \uccode`2=`f
    \uppercase{\gdef\if@12{}}
} % `if` is required
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let&apos;s do this in parts, starting with the bottom group, then the middle &lt;code&gt;\def&lt;/code&gt;, and then move on to the actual &lt;code&gt;\newif&lt;/code&gt;.
Note that only the first form is the actual body of &lt;code&gt;\newif&lt;/code&gt; and that the bottom group and the &lt;code&gt;\def&lt;/code&gt; in the middle
is just part of the one-time setup.
We&apos;ll start with the bottom group.&lt;/p&gt;
&lt;h3&gt;The Bottom Group&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;{
    \uccode`1=`i
    \uccode`2=`f
    \uppercase{\gdef\if@12{}}
} % `if` is required
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Recall from before that the &lt;code&gt;\uccode&lt;/code&gt; macro sets the character code of the uppercase version of a character,
so we can for instance change the uppercase of &lt;code&gt;g&lt;/code&gt; to be &lt;code&gt;H&lt;/code&gt; by writing &lt;code&gt;\uccode`g=`H&lt;/code&gt;.
In our snippet we are setting the uppercase version of the numbers &lt;code&gt;1&lt;/code&gt; and &lt;code&gt;2&lt;/code&gt; to be &lt;code&gt;i&lt;/code&gt; and &lt;code&gt;f&lt;/code&gt;. Yes really.
Also recall that the change is local to the current group, so this change will be undone after the third macro.&lt;/p&gt;
&lt;p&gt;So we&apos;ve changed the uppercase of &lt;code&gt;1&lt;/code&gt; and &lt;code&gt;2&lt;/code&gt;, and next we&apos;re uppercasing a &lt;code&gt;gdef&lt;/code&gt; which name is &lt;code&gt;if@12&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Let&apos;s make this slightly easier by only having one character we uppercase&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*{\uccode`1=`M \uppercase{\gdef\bob1{bob}}}
*\bob
\bob M-&amp;gt;BOB
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Notice that the name of the macro is just &lt;code&gt;\bob&lt;/code&gt;, not &lt;code&gt;\bob1&lt;/code&gt; or &lt;code&gt;\bobM&lt;/code&gt;.&lt;/p&gt;
&lt;h4&gt;A note about more advanced parameter texts&lt;/h4&gt;
&lt;p&gt;TeX allows us to ensure that there are other tokens in the argument list of a macro expansion, or that the arguments are delimited by
certain tokens.
For instance consider the following:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\def\commasep#1,#2{(#1, #2)}
*\message{\commasep 1 2 3 , 9 8 7}
(1 2 3 ,9) 8 7
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We see that the first argument was not in fact just the first token, but all tokens up until we hit &lt;code&gt;,&lt;/code&gt; which
we had after the &lt;code&gt;#1&lt;/code&gt; in the parameter text.
The last argument however, was just the next token.&lt;/p&gt;
&lt;p&gt;We can also do this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\def\mfirst m#1{(#1)}
*\message{\mfirst a a}
! Use of \mfirst doesn&apos;t match its definition.
&amp;lt;*&amp;gt; \mfirst a
              a
*\message{\mfirst m a}
(a)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here we&apos;ve said that we need an &lt;code&gt;m&lt;/code&gt; before we get the next token as the first argument to the macro.
If the next token is not an &lt;code&gt;m&lt;/code&gt;, like in the first attempt, we error.
It is basically a very simple version of pattern matching.&lt;/p&gt;
&lt;h4&gt;Back to Bob&lt;/h4&gt;
&lt;p&gt;In our definition of &lt;code&gt;\bob&lt;/code&gt; we have ensured that the parameter text should end with an uppercase &lt;code&gt;1&lt;/code&gt;, which was &lt;code&gt;M&lt;/code&gt;.
There is a problem though:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\bob M
! Use of \bob doesn&apos;t match its definition.
&amp;lt;*&amp;gt; \bob M

?
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The reason this doesn&apos;t work is that while the uppercase of &lt;code&gt;1&lt;/code&gt; is temporarily set to &lt;code&gt;M&lt;/code&gt;
and the macro really does expect to be called as &lt;code&gt;\bob M&lt;/code&gt;, the &lt;code&gt;M&lt;/code&gt; we send in now has
the wrong character code: it&apos;s a letter and not a number.
We can temporarily change this in a group, and it will work.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*{\catcode`M=12 \bob M}
{begin-group character {}
{entering simple group (level 1)}
{\catcode}
{changing \catcode77=11}
{into \catcode77=12}

\bob M-&amp;gt;BOB
{the letter B}
{end-group character }}
{restoring \catcode77=11}
{leaving simple group (level 1)}
{blank space  }

*
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we are ready to understand the current snippet&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;{\uccode`1=`i \uccode`2=`f \uppercase{\gdef\if@12{}}} % `if` is required
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will define a macro &lt;code&gt;\if@&lt;/code&gt; that ensures that the first two tokens after it is &lt;code&gt;i&lt;/code&gt; and &lt;code&gt;f&lt;/code&gt; with category code &lt;code&gt;12&lt;/code&gt;.
Also note that it will expand to nothing, but it will eat the matched tokens in the parameter list.
In other words:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;*\def\eat h{H} \message{\eat hello}
Hello
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;h&lt;/code&gt; is eaten and replaced with the body of the macro, &lt;code&gt;H&lt;/code&gt;, and the rest of the tokens &lt;code&gt;ello&lt;/code&gt; are just
characters so nothing is done to them, and the result is &lt;code&gt;Hello&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To summarize, we&apos;ve now globally defined a macro &lt;code&gt;if@&lt;/code&gt; which ensures that when applied the next two tokens in the
token list will be two tokens with catcode 12 that is &lt;code&gt;i&lt;/code&gt; and &lt;code&gt;f&lt;/code&gt;, and these tokens will be taken out of the token list.&lt;/p&gt;
&lt;h3&gt;The Middle &lt;code&gt;\def&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Moving on to this part:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;\def\@if#1#2{\csname\expandafter\if@\string#1#2\endcsname}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let&apos;s peel the onion. We&apos;ve got a &lt;code&gt;csname&lt;/code&gt;/&lt;code&gt;endcsname&lt;/code&gt; pair, so the output of the function
will be a control sequence name, which will, unless already defined, be defined to expand to &lt;code&gt;\relax&lt;/code&gt;.
The name will be the result of &lt;code&gt;\expandafter\if@\string#1#2&lt;/code&gt;;
the arguments passed to &lt;code&gt;\@if&lt;/code&gt; (the &lt;code&gt;def&lt;/code&gt; we&apos;re looking at) will thus be sent to &lt;code&gt;\if@&lt;/code&gt;,
but the first argument will be eaten by &lt;code&gt;\string&lt;/code&gt; first.
We just learned that the only thing that &lt;code&gt;\if@&lt;/code&gt; does is to ensure that the first two tokens given
are &lt;code&gt;i f&lt;/code&gt; of catcode 12. And it just so happen that the tokens that we get from expanding &lt;code&gt;\string&lt;/code&gt;
are exactly of catcode 12!&lt;/p&gt;
&lt;p&gt;Let&apos;s try to expand &lt;code&gt;\@if{ifeven}{true}&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;\@if{ifeven}{true}
\csname \expandafter\if@\string{i f e v e n}{t r u e}\endcsname
\csname \if@ i f e v e n {t r u e}\endcsname
\csname e v e n {t r u e}\endcsname
\csname e v e n t r u e\endcsname   % csname doesn&apos;t care about grouping
eventrue
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result is a single control sequence token with the name &lt;code&gt;eventrue&lt;/code&gt;.
That&apos;s it! As long as the &lt;code&gt;\string&lt;/code&gt; expansion of the first argument starts with &lt;code&gt;i f&lt;/code&gt;
we will get a control sequence token that is the concatenation of the two arguments.&lt;/p&gt;
&lt;h3&gt;The First &lt;code&gt;\def&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Phew, back at the top. Here it is, once more:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;\outer\def\newif#1{
    \count@=\escapechar
    \escapechar=-1
    \expandafter\expandafter\expandafter \def\@if#1{true}{\let#1=\iftrue}%
    \expandafter\expandafter\expandafter \def\@if#1{false}{\let#1=\iffalse}%
    \@if#1{false} % the condition starts out false
    \escapechar=\count@
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We&apos;re almost there; it&apos;s just a matter of piecing together some of the parts that we&apos;ve already
unravelled.
First we can note that we are temporarily setting &lt;code&gt;\escapechar&lt;/code&gt; to be &lt;code&gt;-1&lt;/code&gt; and then restoring it
at the end. There are two questions we can answer here: (1) why do we set it, and (2) why can&apos;t we group it instead?&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;We want the argument to &lt;code&gt;\newif&lt;/code&gt; to be a control sequence, like &lt;code&gt;\newif\ifred&lt;/code&gt;,
and we also need to check that the given control sequence starts with &lt;code&gt;if&lt;/code&gt;,
which we do in &lt;code&gt;\if@&lt;/code&gt; through the &lt;code&gt;\string&lt;/code&gt; macro. If naively applied, &lt;code&gt;\string\ifred&lt;/code&gt; would
expand to &lt;code&gt;\ i f r e d&lt;/code&gt;, but we need it to be &lt;code&gt;i f r e d&lt;/code&gt;. By setting &lt;code&gt;\escapechar=-1&lt;/code&gt;
we make &lt;code&gt;\string&lt;/code&gt; output nothing for &lt;code&gt;\&lt;/code&gt;, and we are good.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Had we used grouping the &lt;code&gt;\def&lt;/code&gt;s we have inside would be local to the group and effectively destroyed
by the time we are done expanding &lt;code&gt;\newif&lt;/code&gt;. If we were to use &lt;code&gt;\gdef&lt;/code&gt; then all defined macros with &lt;code&gt;\newif&lt;/code&gt; would have
to be global. This way we can have the user define &lt;code&gt;\newif&lt;/code&gt;s that are local to their groups.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That only leaves three lines in the macro body, and two of them are of the same form.
From earlier we remember that three &lt;code&gt;\expandafter&lt;/code&gt; would expand the second token in the token list twice.
Let&apos;s assume &lt;code&gt;#1 = \ifred&lt;/code&gt;. With the total form&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;\expandafter\expandafter\expandafter \def \@if \ifred {true} {\let \ifred = \iftrue}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;we would first expand &lt;code&gt;\@if&lt;/code&gt;, which will eat two tokens, &lt;code&gt;#1&lt;/code&gt; and &lt;code&gt;{true}&lt;/code&gt; and be replaced with the
body of the macro, as seen above. Then we need a second expansion to expand the &lt;code&gt;csname&lt;/code&gt; pair,
and this will expand to the control sequence token &lt;code&gt;redtrue&lt;/code&gt;. This would be put back in the token queue,&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;\expandafter \def \csname \expandafter\if@\string\ifred{true}\endcsname{\let \ifred = \iftrue}
\def \redtrue{\let \ifred = \iftrue}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and at the end we have a familiar form. The same happens with the &lt;code&gt;false&lt;/code&gt; variant.
The next line is then ran:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;\@if\ifred{false} % expand:
\csname \expandafter\if@\string\ifred{true}\endcsname  % eval the csname pair
\redfalse  % we just defined this macro
\let\ifred=\iffalse  % run this
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At last, we restore &lt;code&gt;\escapechar&lt;/code&gt; to whatever it was initially.&lt;/p&gt;
&lt;h2&gt;In Conclusion&lt;/h2&gt;
&lt;p&gt;Taking it all together, running &lt;code&gt;\newif\ifred&lt;/code&gt; expands to this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-tex&quot;&gt;% In the preamble we have the forms
\def\@if#1#2{\csname\expandafter\if@\string#1#2\endcsname}
{\uccode`1=`i \uccode`2=`f \uppercase{\gdef\if@12{}}} % `if` is required

% The user writes
\newif\ifred
% .. which expands to
\count@=\escapechar
\escapechar=-1
\expandafter\expandafter\expandafter \def\@if\ifred{true}{\let\ifred=\iftrue}
\expandafter\expandafter\expandafter \def\@if\ifred{false}{\let\ifred=\iffalse}
\@if\ifred{false}
\escapechar=\count@
% ... which is basically the same as
\def\redtrue{\let\ifred=\iftrue}
\def\redfalse{\let\ifred=\iffalse}
\redfalse
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and that&apos;s it!
So hey, we had to peel a few onions&lt;sup&gt;&lt;a href=&quot;#user-content-fn-onion&quot; id=&quot;user-content-fnref-onion&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;, but in the end we managed to unravel the mystery and
really understand what&apos;s going on in &lt;code&gt;\newif&lt;/code&gt;; it turns out it&apos;s quite a lot, though the main
functionality seems that we don&apos;t have to write these three lines every time we want to define a new conditional,
but that only one suffices.&lt;/p&gt;
&lt;p&gt;If you want to know more &amp;quot;real&amp;quot; definition and edge cases, check out &lt;a href=&quot;https://www.tug.org/utilities/plain/cseq.html&quot;&gt;this site&lt;/a&gt;;
I went back and forth on that and in the TeXbook when writing this post, and having a searchable index of basically
the entire language is, well, indispensable. Of course, if you don&apos;t know much about TeX from before
I can only assume that the reference will be hard to dig into.&lt;/p&gt;
&lt;p&gt;Notes, comments, questions, and tomatoes can be sent to my &lt;a href=&quot;mailto:https://lists.sr.ht/~mht/public-inbox&quot;&gt;public inbox&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Hope you learned something, and thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-simple&quot;&gt;
&lt;p&gt;I couldn&apos;t call C89 or TeX simple in good faith. &lt;a href=&quot;#user-content-fnref-simple&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-kube&quot;&gt;
&lt;p&gt;I don&apos;t know what I&apos;m talking about here; can you tell? &lt;a href=&quot;#user-content-fnref-kube&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-arch&quot;&gt;
&lt;p&gt;btw I use arch &lt;a href=&quot;#user-content-fnref-arch&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-writechar&quot;&gt;
&lt;p&gt;This isn&apos;t really how it works, but for the purposes of this post we might as well pretend it is. &lt;a href=&quot;#user-content-fnref-writechar&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-curry&quot;&gt;
&lt;p&gt;This example is more close to destructuring, but I didn&apos;t want to get in the weeds of constructing an example that looked more like currying. Here&apos;s a sketch: you can have a macro in the body of another macro &lt;code&gt;\func #1 x y&lt;/code&gt; such that &lt;code&gt;#1&lt;/code&gt; expands to another macro. If we &lt;code&gt;\expandafter&lt;/code&gt; the &lt;code&gt;#1&lt;/code&gt; here we might get something like &lt;code&gt;\func u v w x y&lt;/code&gt; and so we&apos;ve effectively constructed a function &lt;code&gt;f(g) = h(g(), x, y)&lt;/code&gt;. &lt;a href=&quot;#user-content-fnref-curry&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-onion&quot;&gt;
&lt;p&gt;Something something crying when peeling an onion. &lt;a href=&quot;#user-content-fnref-onion&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>LLMs are useful now</title><id>https://mht.wtf/post/ai26/</id><updated>2026-02-27T21:08:40+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/ai26/" rel=""/><link href="https://mht.wtf/post/ai26/index.html" rel="alternate"/><published>2026-02-27T21:08:40+01:00</published><content type="text/html">&lt;p&gt;My &lt;a href=&quot;/post/static-site/&quot;&gt;static site generator&lt;/a&gt; has a quirk: there&apos;s no &amp;quot;draft&amp;quot; marker for posts.
Everthing that&apos;s in the right folder gets published.
This means I have a separate &lt;code&gt;draft&lt;/code&gt; directory where I place unfinished scraps and ideas.
Creating a new draft has meant copying an existing draft, deleting the markdown contents, and replacing the frontmatter fields manually.&lt;/p&gt;
&lt;p&gt;It&apos;s annoying to do, and it&apos;s dumb to do because it should be easily automatable.
I guesstimated what the entry to the &lt;code&gt;justfile&lt;/code&gt; should be:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-justfile&quot;&gt;draft name:
  slug=$(echo &amp;quot;{{name}}&amp;quot; | sed -E &apos;s/\W+/-/g&apos;)
  cat &amp;lt;&amp;lt;EOF &amp;gt; draft/{{slug}}.md 
  lkj
EOF
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But it was wrong.
Wrong syntax altogether.
I looked at some of my other &lt;code&gt;justfile&lt;/code&gt;s to see what that bash multi-line string literal syntax is, but I couldn&apos;t find anything.
Oh that&apos;s right, the &lt;code&gt;justfile&lt;/code&gt; syntax isn&apos;t quite bash, but isn&apos;t there some syntax to
write bash inside a rule?
Let&apos;s see, where would I have used that?
My success rate on the online &lt;code&gt;just&lt;/code&gt; docs is less than 50%, so I&apos;m avoiding it.&lt;/p&gt;
&lt;p&gt;Fuck it, &lt;code&gt;claude&lt;/code&gt; can do it.&lt;/p&gt;
&lt;p&gt;My prompt was &lt;code&gt;I&apos;m trying to write a just draft rule in justfile, help me out&lt;/code&gt;.
It edited the &lt;code&gt;justfile&lt;/code&gt; to look like this instead:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-justfile&quot;&gt;# Start a new draft
draft name:
  #!/usr/bin/env bash
  set -euo pipefail
  slug=$(echo &amp;quot;{{name}}&amp;quot; | sed -E &apos;s/\W+/-/g&apos; | tr &apos;[:upper:]&apos; &apos;[:lower:]&apos;)
  filename=&amp;quot;draft/$slug.md&amp;quot;
  if [[ -f &amp;quot;$filename&amp;quot; ]]; then
    echo &amp;quot;Draft already exists: $filename&amp;quot;
  else
    touch &amp;quot;$filename&amp;quot;
  fi
  &amp;quot;$EDITOR&amp;quot; &amp;quot;$filename&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Okay, bash is here, but frontmatter is missing. I guess I didn&apos;t say that I wanted that.
Another prompt specifying this, I now have this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Start a new draft
draft name:
  #!/usr/bin/env bash
  set -euo pipefail
  slug=$(echo &amp;quot;{{name}}&amp;quot; | sed -E &apos;s/\W+/-/g&apos; | tr &apos;[:upper:]&apos; &apos;[:lower:]&apos;)
  filename=&amp;quot;draft/$slug.md&amp;quot;
  if [[ -f &amp;quot;$filename&amp;quot; ]]; then
    echo &amp;quot;Draft already exists: $filename&amp;quot;
  else
    cat &amp;lt;&amp;lt;EOF &amp;gt; &amp;quot;$filename&amp;quot;
---
title: &amp;quot;{{name}}&amp;quot;
date: $(date -Iseconds)
template: &apos;blog-post.html&apos;
---

EOF
  fi
  &amp;quot;$EDITOR&amp;quot; &amp;quot;$filename&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Okay, so this doesn&apos;t quite work because &lt;code&gt;just&lt;/code&gt; parses the &lt;code&gt;-&lt;/code&gt; as something else,
so this is just wrong.
One more iteration -- this time explicitly writing to test &lt;code&gt;just draft asd&lt;/code&gt; -- and we&apos;re at this caveman solution:
(Also, I had to manually edit in the &lt;code&gt;exit 1&lt;/code&gt; line)&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-justfile&quot;&gt;# Start a new draft
draft name:
  #!/usr/bin/env bash
  set -euo pipefail
  slug=$(echo &amp;quot;{{name}}&amp;quot; | sed -E &apos;s/\W+/-/g&apos; | tr &apos;[:upper:]&apos; &apos;[:lower:]&apos;)
  filename=&amp;quot;draft/$slug.md&amp;quot;
  if [[ -f &amp;quot;$filename&amp;quot; ]]; then
    echo &amp;quot;Draft already exists: $filename&amp;quot;
    exit 1
  else
    printf &apos;%s\n&apos; &apos;---&apos; &apos;title: &amp;quot;{{name}}&amp;quot;&apos; &amp;quot;date: $(date -Iseconds)&amp;quot; &amp;quot;template: &apos;blog-post.html&apos;&amp;quot; &apos;---&apos; &apos;&apos; &amp;gt; &amp;quot;$filename&amp;quot;
  fi
  &amp;quot;$EDITOR&amp;quot; &amp;quot;$filename&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This took three prompts, but
I could have done the entire process while holding my breath.
In that time I couldn&apos;t even have figured out that &lt;code&gt;date -Iseconds&lt;/code&gt; is how to print that time format,
let alone the correct sequence of letters for &lt;code&gt;set -euo pipefail&lt;/code&gt;, or how to write bash in the &lt;code&gt;justfile&lt;/code&gt;, or basically anything relating to bash.
Have you ever looked at &lt;code&gt;man bash&lt;/code&gt;??&lt;/p&gt;
&lt;p&gt;LLMs have finally become useful to me.&lt;/p&gt;
&lt;h2&gt;First Impressions&lt;/h2&gt;
&lt;p&gt;I don&apos;t remember the first time trying ChatGPT, so I cannot have been very impressed.
Sure, it was better than previous similar systems, but once I got used to its output
I found it was mostly smoke and mirrors.&lt;/p&gt;
&lt;p&gt;I do remember trying to have it generate research ideas.
I&apos;d been out of academia for a few months so I was still in that headspace,
and I had it generate titles and abstracts for papers as a brainstorming exercise.
At that time, writing the promps by starting to write the abstract seemed and having
the llm pick up mid sentence where you left off seemed to generate
okay results.
That is, it wasn&apos;t a conversation, it was more like trying the prompt &lt;code&gt;&amp;quot;The best cs paper of the year award goes to &amp;quot;&lt;/code&gt;
and seeing what it would generate for the paper title.&lt;/p&gt;
&lt;p&gt;At some point, I decided to try to hint it towards &lt;a href=&quot;/proposal-wire-puzzles.pdf&quot;&gt;a project&lt;/a&gt; that I had already researched quite a bit in order to tell noise from signal.
I was &lt;em&gt;very&lt;/em&gt; impressed.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Prompt and response&lt;/summary&gt;
&lt;code&gt;prompt:&lt;/code&gt;&lt;br/&gt;  
In the past decase graphics researchers have seen a wide variety of computational design systems for puzzles, including interlocking puzzles, burr puzzles, twisty puzzles, and jigsaw puzzles. Many of these approaches aim to decompose a 3D shape into puzzle pieces using voxelization. In this project we want to design wire puzzles, which is a kind of entanglement puzzles consisting of rigid curves that are intertwined. The key observation is 
&lt;br /&gt;
&lt;br /&gt;
&lt;code&gt;response:&lt;/code&gt;&lt;br/&gt;  
that the problem of designing such a puzzle can be translated into the problem of designing a knot. Knot theory is a branch of mathematics that studies mathematical knots. 
&lt;p&gt;A mathematical knot is a way of embedding a circle in 3-dimensional Euclidean space, R3, considered up to continuous deformations. Two mathematical knots are equivalent if one can be transformed into the other via a continuous deformation of R3 upon itself (known as an ambient isotopy); these transformations correspond to manipulations of a knotted piece of string that do not involve cutting the string or passing the string through itself. The simplest knot is the unknot, which can be smoothly deformed into a circle, and the study of knots began with attempts to understand when two knots were equivalent.&lt;/p&gt;
&lt;p&gt;The simplest nontrivial knot is the trefoil knot, which is obtained by joining together two circle arcs so that they cross three times (a chiral trefoil knot is one where the three crossings are all right-handed, or all left-handed). Other knots can be described in a similar way, with more crossings. The number of crossings is called the crossing number and is an important invariant in the study of knots. The issue of deciding when two given knots are equivalent is called the knot equivalence problem. It is known to be computationally difficult, in the sense that there is no known algorithm that, given two descriptions of knots, can always determine whether or not they are equivalent. Nevertheless, there are some partial results and some special cases that are known. For example, it is known that the unknot is not equivalent to the trefoil knot, but it is not known whether the converse is true, i.e. whether every knot is equivalent to the unknot.&lt;/p&gt;
&lt;p&gt;In this project we want to design a method to generate wire puzzles. Given a 3D model, our method should be able to automatically generate a wire puzzle that can be assembled from the 3D model. There are two main challenges in this project. First, we need to find a way to represent a 3D shape as a knot. Second, we need to design an algorithm that can generate a wire puzzle from a given knot. We propose an abstract representation of a knot, which we call a string diagram. A string diagram is a graph that encodes the topology of a knot. We then design an algorithm that can generate a wire puzzle from a given string diagram.&lt;/p&gt;
&lt;p&gt;The first challenge is to find a way to represent a 3D shape as a knot. We propose an abstract representation of a knot, which we call a string diagram. A string diagram is a graph that encodes the topology of a knot. The idea is to represent a 3D shape as a collection of points in space, and then to connect these points together with curves. The resulting curves will form a knot.&lt;/p&gt;
&lt;p&gt;The second challenge is to design an algorithm that can generate a wire puzzle from a given knot. We design an algorithm that can generate a wire puzzle from a given string diagram. The algorithm works by first finding a set of non-intersecting curves that approximate the given string diagram. These curves are then used to generate a set of puzzle pieces. The algorithm then tries to find a way to assemble the puzzle pieces so that they form the desired knot. We have implemented our algorithm and have generated several wire puzzles. Our results show that our algorithm is able to generate wire puzzles that are topologically equivalent to the given knot.&lt;/p&gt;
&lt;/details&gt;
&lt;p&gt;This was the only example of llms really impressing me this year, because it wasn&apos;t obvious to me that
it was &amp;quot;just text generation&amp;quot;.
I mean, it is, but the generated text carried meaning that wasn&apos;t obvious from the prompt that &lt;em&gt;also&lt;/em&gt; aligned with
meaning that I had come up with using my human brain.
It was still repetetive and had the smell of llms at that time, but there was &lt;em&gt;something&lt;/em&gt; there.&lt;/p&gt;
&lt;h2&gt;My LLM Winter&lt;/h2&gt;
&lt;p&gt;Fast forward a year, and in the fall of 2024 I tried Cursor.
It had been out for a while I think, and while coworkers had adopted it, I hadn&apos;t found it very helpful.
The autocomplete could be useful sometimes, but it was more distracting than helpful, and it could definitely not be let loose on its own.
I had it generate some serverless yaml file for deploying a service consisting of a few pipeline stages, and it suggsested to use AWS StepFunctions.
It took the better part of a week to get the whole thing working,
because I needed to build a docker image of a python service with some weird dependencies, and to deploy the lambdas, get the StepFunction config right,
and then iron out all of the small bugs relating to data formats in between the stages, and so on.&lt;/p&gt;
&lt;p&gt;It sucked, and cursor didn&apos;t really help.
I stopped using cursor and moved to Zed at some point afterwards.
Not because of AI (Zed barely had any llm-features at the time, if I recall correctly), but because of human factors&lt;sup&gt;&lt;a href=&quot;#user-content-fn-zed&quot; id=&quot;user-content-fnref-zed&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.
Meanwhile, people seemed to really like cursor, and
the talk about 10X engineers really took off.
I started joking that the only reason engineers using cursor ships
10 times faster is because they ship 10 times more code.&lt;/p&gt;
&lt;p&gt;I think the company still uses those StepFunctions.&lt;/p&gt;
&lt;h2&gt;SVGs on Vibes&lt;/h2&gt;
&lt;p&gt;In August &apos;25 while working on &lt;a href=&quot;/post/navigate&quot;&gt;&amp;quot;Navigate Gates&amp;quot;&lt;/a&gt;
I was annoyed that creating &lt;code&gt;svg&lt;/code&gt;s that looked good both in dark- and light-mode was hard.
I ended up with gray-on-transparent so that at least it&apos;d be visible in both,
but &lt;code&gt;svg&lt;/code&gt;s can contain &lt;code&gt;css&lt;/code&gt;, and it can conditionally render based on the user&apos;s preferred theme,
so I &lt;em&gt;should&lt;/em&gt; have good looking &lt;code&gt;svg&lt;/code&gt;s.
I just don&apos;t know how to create them like that.&lt;/p&gt;
&lt;p&gt;I figured an &lt;code&gt;svg&lt;/code&gt; is just XML nodes like the DOM, so how hard should it be
to create a simple &lt;code&gt;svg&lt;/code&gt; editor in a webapp?
The browser already has all kinds of APIs for interacting with the DOM.
I used Zed&apos;s llm support and leaned heavily on the llm to write the code.
Not all vibes, but close to.
It was pretty good until the codebase reached around 1500 lines,
at which point each prompt that fixed a bug introduced another.&lt;/p&gt;
&lt;p&gt;I spent some time to clean it up and make things a little nicer,
and allowed myself to get sidetracked on other features, like a
configurable background grid and node snapping.
I never got to actually outputting CSS with different light and dark colors.
Maybe the next time I need such an &lt;code&gt;svg&lt;/code&gt; I&apos;ll finish it.&lt;/p&gt;
&lt;p&gt;I continued to roll my eyes at 10x-engineer-with-cursor memes,
because I had seen the very frequent failure modes of the tech.
Also, where were the output of these 10Xers?
Where were the products?&lt;sup&gt;&lt;a href=&quot;#user-content-fn-products&quot; id=&quot;user-content-fnref-products&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2&gt;January&apos;26&lt;/h2&gt;
&lt;p&gt;I didn&apos;t realize half of programmer-internet played around with Claude Code over the Christmas break of &apos;25,
but in mid January I decided to try it. Since then, in those ~5 weeks, using claude, I have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;made a slackbot for my friends, Hypeman, that reacts to good news.&lt;/li&gt;
&lt;li&gt;flashed the firmware on my bluetooth speakers.&lt;/li&gt;
&lt;li&gt;created a &lt;a href=&quot;/post/ark/&quot;&gt;markdown-hosting&lt;/a&gt; service for myself.&lt;/li&gt;
&lt;li&gt;largely vibecoded a launcher (think Raycast) with a calculator, color picker, clipboard history, volume management, brightness controls, bluetooth handling, todo list, train schedule, and more. I use this every day.&lt;/li&gt;
&lt;li&gt;set up a data scraper for my local gym and an air purifier, storing data in influxdb.&lt;/li&gt;
&lt;li&gt;configured &lt;a href=&quot;https://anubis.techaro.lol/&quot;&gt;anubis&lt;/a&gt; for most of my internet-facing endpoints.&lt;/li&gt;
&lt;li&gt;tons of small improvements on my other running code, like my &lt;a href=&quot;/post/rss&quot;&gt;rss&lt;/a&gt; reader, which I also use every day.&lt;/li&gt;
&lt;li&gt;integrated with various random APIs, like local weather, train schedules, or wine prices and stock in the local wine store.&lt;/li&gt;
&lt;li&gt;migrated all of my stuff to another vps, going all-in on &lt;code&gt;docker compose&lt;/code&gt; to avoid making a mess.&lt;/li&gt;
&lt;li&gt;maybe even more things I can&apos;t think of.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&apos;s mostly smaller things (apart from the launcher), but it&apos;s things that are very useful to me.
It&apos;s also &amp;quot;easy&amp;quot; things in the sense that they don&apos;t require research, deep knowledge, or any actual hard problems.
Most of this work is what I would classify as programmer bullshit:
things that are required to make the computer do the thing,
and that require tribal programming knowledge because of reasons other programmers have made up.&lt;/p&gt;
&lt;p&gt;Take Hypeman.
It&apos;s a slackbot that tries to react to positive or celebratory messages with emojis.
If someone writes &lt;code&gt;&amp;quot;Payday today, who&apos;s up for beers?&amp;quot;&lt;/code&gt;, it might react to the message with 💰🍺.
Yes, it&apos;s stupid, but it&apos;s also funny.
I completely vibe-coded it, and short of opening a &lt;code&gt;fly.yaml&lt;/code&gt; or something similar,
I have basically not looked at any of the code.
I still don&apos;t know how what the slack API looks like, and I don&apos;t really care either.
I have a rough idea of what needs to be in &lt;code&gt;fly.yaml&lt;/code&gt;, but if you give be pen and paper there&apos;s no way I could write
anything close to a valid config file.
Again, I don&apos;t really care.&lt;/p&gt;
&lt;p&gt;Or what about the launcher.
The linux desktop experience is pretty bad, but doing anything about it is a nightmare.
I&apos;ve tried to take a stab at a bluetooth manager, but the startup cost of doing anything useful
is just too high for me.
How the hell does DBus works, can just all programs listen to any message on the bus?
What about sensitive data?&lt;/p&gt;
&lt;p&gt;Anyways, I&apos;ve started fixing these papercuts for myself.
My launcher does 95% of my bluetooth handling now, which is connecting to my headset or my speakers.
Writing &lt;code&gt;bt&lt;/code&gt; brings up the list of known devices, and pressing &lt;code&gt;enter&lt;/code&gt; tries to connect to it:&lt;/p&gt;
&lt;figure style=&quot;display: flex; justify-content: center&quot;&gt;
  &lt;div style=&quot;max-width: 400px&quot;&gt;
    &lt;img src=&quot;./bt.png&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;It&apos;s pretty simple under the hood, because it just shells out to &lt;code&gt;bluetoothctl&lt;/code&gt;.
That meant I got hit by &lt;a href=&quot;https://github.com/bluez/bluez/issues/1896&quot;&gt;bluez#1896&lt;/a&gt;, which I,
of course, attributed to the agent at first.
No problem, I had the agent fix it too, so now it spawns &lt;code&gt;bluetoothctl&lt;/code&gt; with redirected io
and writes commands into the process from the parent process.
Classic programmer bullshit, but that&apos;s okay -- the agent did it.&lt;/p&gt;
&lt;p&gt;Oh, emojis are annoying to get on linux because there&apos;s, of course, no built-in or universal convenient way of
getting an emoji?
Okay, there is now:&lt;/p&gt;
&lt;figure style=&quot;display: flex; justify-content: center&quot;&gt;
  &lt;div style=&quot;max-width: 400px&quot;&gt;
    &lt;img src=&quot;./emoji.png&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;Do note that it&apos;s not actually &lt;em&gt;good&lt;/em&gt;; it&apos;s hard-coded to wrap at 8 emojis because claude couldn&apos;t (easily)
figure out how to get it to properly wrap, even though the width of frame is also hardcoded.
It doesn&apos;t matter though, because it&apos;s useful. Take a guess at how I inserted the moneybag and beer emoji earlier in this post.&lt;/p&gt;
&lt;p&gt;It&apos;s also personalized. It only uses &lt;code&gt;wl-copy&lt;/code&gt; to interact with the clipboard, because that&apos;s what I use.
The agent tried some &lt;code&gt;iced&lt;/code&gt; clipboard thing that didn&apos;t work; fuck it, shell out to &lt;code&gt;wl-copy&lt;/code&gt; instead.
It&apos;ll never work on windows, or osx, or probably other computers without any code changes.
That&apos;s okay, because I don&apos;t need it to.
When the cost of creating software drops it becomes less important to reuse&lt;sup&gt;&lt;a href=&quot;#user-content-fn-reuse&quot; id=&quot;user-content-fnref-reuse&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; it.&lt;/p&gt;
&lt;p&gt;My mental model of llm output used to be that its output is about as trustworthy as some random blog post:
maybe it works, maybe it&apos;s broken;
maybe it&apos;s good, maybe it&apos;s bad.
I&apos;m heading towards treating it like a random package on npm.
It probably at least kinda works, but it&apos;s probably worse than what I would have done myself.
However, it&apos;s easy to get, and it&apos;s already dealt with the programmer bullshit that I would have to
deal with, had I written it myself.&lt;/p&gt;
&lt;p&gt;This is a huge difference:
I&apos;ve basically never randomly copied code from blogs in projects that I care about,
but I &lt;em&gt;have&lt;/em&gt; installed third-party packages of questionable quality in &lt;em&gt;a lot&lt;/em&gt; of the projects I&apos;ve done.
When the cost of generating and editing this code drops, it&apos;s becoming increasinbly viable
not to use code that was written by other people with other constraints solving other problems.
That is a future I want.&lt;/p&gt;
&lt;h2&gt;What&apos;s Next?&lt;/h2&gt;
&lt;p&gt;I&apos;m curious what the future will bring.
Here&apos;s my top open questions in this space, in no particular order:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Legality. How do we square the copyright infringement used to train llms? Who owns the copyright for llm output?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Sustainability. Can we make llms sustainable? Can we be a net-positive contributor to society without leaning on the promise of a brighter future?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Availability. Is the era of open computing - where anyone, anywhere, with any computer can learn to program and participate in computing - over? Have we paywalled computing?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Progress. Will the progress continue? If programming is solved, is CS solved? Is math solved? We don&apos;t have any guarantee that progress will continue - the Wright brothers flew in 1903, and we still don&apos;t have flying cars.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Despite the hurdles, I&apos;m optimistic.
It&apos;s fun to use llms to create stuff, because it so happens that current llms are great
at doing the things that I probably like the least about programming.
It&apos;s not yet a magical button that automates &lt;em&gt;all&lt;/em&gt; of the work,
and there&apos;s still plenty of things I need to be in the loop of.
Still, &lt;em&gt;if&lt;/em&gt; the pace continues, I think software engineers will have to adjust pretty drastically.
Here&apos;s some theories, in the order I expect them to happen.&lt;/p&gt;
&lt;p&gt;First, much of open source will be abandoned.
I think we&apos;re starting to see this already.
Why contribute to a library if I can generate the very small subset of it that I need for my use-case?
Hardened projects, like linux, nginx, or firefox, aren&apos;t going away soon,
but small utilities, api clients, wrappers, and helpers are going to go away.
The value of large projects will be extensive testing and verification.
Integrating and maintaining third-party libraries
will be more costly than generating the parts of them that you need.
We&apos;ll see more in-tree code and less out-of-tree code.&lt;/p&gt;
&lt;p&gt;Second, there will simply be fewer programmers.
There will be a stronger divide between code that &amp;quot;kinda needs to work&amp;quot;
and code that absolutely &amp;quot;needs to work&amp;quot;, and there&apos;ll be little code in between.
The &amp;quot;kinda needs to work&amp;quot; kind will be generated code, and
the &amp;quot;needs to work&amp;quot; kind will be (mostly) written by hand.
Companies will figure out which is which, automate the former, and outsource the latter.
They&apos;ll still need people to do the automation and handle it when it doesn&apos;t work,
and they&apos;ll need people to figure out what to build in the first place.
They won&apos;t need people who remembers what the flags for &lt;code&gt;grep&lt;/code&gt; does&lt;sup&gt;&lt;a href=&quot;#user-content-fn-grep&quot; id=&quot;user-content-fnref-grep&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Over time, llms will git gud and they will generate more and more code that &amp;quot;needs to work&amp;quot;.
Eventually, the amount of code that cannot be generated will be too small to be meaningfully spoken of.
Writing code by hand is a hobby.&lt;/p&gt;
&lt;p&gt;So what is third?
Maybe we finally get software that automatically adapts to our needs.
Current software is limited by its economics:
it needs to make sense for a lot of people to come together to build a product in order for that product to exist.
Thus, a lot of people needs to have the same problem (or a few rich ones), so that the people creating the project can buy food and shelter.&lt;/p&gt;
&lt;p&gt;Problems that are rare aren&apos;t supported in this economy.
Today, I was at a second-hand store and they had a few meters of shelves of books for free.
I spent a minute with my head on the side looking for something interesting.
Computers could have helped me: they can take pictures, do ocr, they know what books I&apos;ve read, and which of those I&apos;ve liked.
The individual parts are solved, but I didn&apos;t have access to the entire pipeline.
Where&apos;s Uber for &amp;quot;I&apos;m in front of a bookshelf looking for books I want to read, all books are free, I want one or two&amp;quot;?
The economics of this problem doesn&apos;t make sense.&lt;/p&gt;
&lt;p&gt;Computers didn&apos;t help me, and I left with no books and a slight pain in my neck.&lt;/p&gt;
&lt;h2&gt;Ethics&lt;/h2&gt;
&lt;p&gt;A thorny topic, but I want to include it because of its importance.
I&apos;ve written about &lt;a href=&quot;/post/aicohol&quot;&gt;my ethics around llms&lt;/a&gt; before. Today, I feel the same,
although llms have definitely crossed the &amp;quot;saves me time&amp;quot; line, if only for a certain type of work.
The social and legal problems are absolutely still here, and I don&apos;t know how to fix those.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;If llms doesn&apos;t scale further (no matter the reason for why), it&apos;s okay.
They&apos;re already good enough to be useful.&lt;/p&gt;
&lt;p&gt;Still, it makes me wonder what programming looks like in five years.
Will we finally get better software?
Or did we just sell all of our flash memory to nvidia, in perpituity?&lt;/p&gt;
&lt;p&gt;I guess we&apos;ll find out.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;hr /&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-zed&quot;&gt;
&lt;p&gt;At the time, cursor was a very janky VS Code clone with terrible performance, flashing, and input lag. It&apos;s only feature was the auto-complete, which, again, I was lukewarm to. &lt;a href=&quot;#user-content-fnref-zed&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-products&quot;&gt;
&lt;p&gt;I have a blog post draft from May&apos;25 complaining about this: if llms are so great, how come the market isn&apos;t flooded with high quality products? I still haven&apos;t seen this, but who knows; maybe in six months it will be? &lt;a href=&quot;#user-content-fnref-products&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-reuse&quot;&gt;
&lt;p&gt;Is software like clothes? Is it not sustainable to create new software at a whim, and do we &lt;em&gt;need&lt;/em&gt; rules around reuse of software, like fabric? This is not obvious to me. &lt;a href=&quot;#user-content-fnref-reuse&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-grep&quot;&gt;
&lt;p&gt;Employers aren&apos;t paying people to know &lt;code&gt;grep&lt;/code&gt; flags, but knowing how to use &lt;code&gt;grep&lt;/code&gt; effectively allowes engineers to be more effective at the jobs that their employer &lt;em&gt;are&lt;/em&gt; paying them to do. I think the importance of this knowledge is shrinking. &lt;a href=&quot;#user-content-fnref-grep&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>How Much Abstraction Is Too Much?</title><id>https://mht.wtf/post/rust-indirection/</id><updated>2017-06-21T12:37:40+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/rust-indirection/" rel=""/><link href="https://mht.wtf/post/rust-indirection/index.html" rel="alternate"/><published>2017-06-21T12:37:40+02:00</published><content type="text/html">&lt;p&gt;Let&apos;s talk about abstraction.
As we know from &lt;a href=&quot;https://tools.ietf.org/html/rfc1925&quot;&gt;RFC 1925&lt;/a&gt; it is easier to move a problem around than it is to solve it.
This directly suggests &lt;em&gt;Abstraction Based Development&lt;/em&gt;.
It goes like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Have a problem&lt;/li&gt;
&lt;li&gt;Solve 5% of the problem&lt;/li&gt;
&lt;li&gt;Invent an abstraction to solve the remaining 95%&lt;/li&gt;
&lt;li&gt;Recucrse on the abstraction&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After only 91 steps (more or less) we have reduced the problem to only 1% of its original size,
making the problem trivial to solve, since it can be solved with a one-liner in Python
(this is the unit of problem hardness in CS&lt;sup&gt;&lt;a href=&quot;#user-content-fn-npc&quot; id=&quot;user-content-fnref-npc&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;).&lt;/p&gt;
&lt;p&gt;On a more serious note, abstractions are useful.
Abstractions are everywhere.
Dynamically-sized arrays are an abstraction.
Iterators are an abstraction.
Abstractions are all about hiding the stuff we do not care about, and reduce it to the stuff we &lt;em&gt;do&lt;/em&gt; care about.
We don&apos;t &lt;em&gt;really&lt;/em&gt; care about the fact that arrays are of fixed size in memory, and that we have
to resize it when we need more, we just want to &lt;code&gt;push&lt;/code&gt; stuff onto our &lt;code&gt;Vec&lt;/code&gt;.
It is easy to infer from this that abstractions are about simplification:
we do not want the details, but only the big picture.&lt;/p&gt;
&lt;p&gt;What often follows from abstraction is &lt;em&gt;indirection&lt;/em&gt;.
Dynamic dispatch. Compile time generics.
The magic stuff the compiler does for us when we only &lt;em&gt;kinda&lt;/em&gt; say what we want.
Calling &lt;code&gt;iterator.next()&lt;/code&gt;? Ah compiler, you understand what I mean.
But to someone reading the code it is not always obvious what happens.
In the conceptual sense, we understand that the iterator will produce the next value
in the set of values it is iterating over. That part is alright.
But what &lt;em&gt;exactly&lt;/em&gt; is happening?
&lt;em&gt;Where&lt;/em&gt; is that code?
Is this operation cache friendly?
Is it computaitonally complex?
How confident can we be in that the implementation is correct?
What can we do if we suspect something is wrong?
These questions are not always simple to answer&lt;sup&gt;&lt;a href=&quot;#user-content-fn-docs&quot; id=&quot;user-content-fnref-docs&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;I will try to argue that abstractions have a very real and very serious downside,
that (seemingly) is often overlooked: complexity.&lt;/p&gt;
&lt;p&gt;But first of all, the code in this post is real code written by real people.
This should go without saying, but just to be absolutely clear:
I do not mean to talk down on neither the code nor the authors of the code, and
I do not think this is bad code.
It is just a good example.&lt;/p&gt;
&lt;h1&gt;A Motivating Example: &lt;a href=&quot;https://doc.rust-lang.org/std/string/struct.String.html#method.contains&quot;&gt;&lt;code&gt;String::contains&lt;/code&gt;&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Maybe you have just learned about string searching algorithms.
&lt;em&gt;Aho-Corasick&lt;/em&gt;, &lt;em&gt;Boyer-Moore&lt;/em&gt;, &lt;em&gt;Knuth-Morris-Pratt&lt;/em&gt;, you name it.
You, a curious person and a Rust programmer, start to wonder.
How is &lt;code&gt;String::contains&lt;/code&gt; implemented in the Rust standard library?
Let us take a look. First we need to find the method:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ rg &amp;quot;struct String&amp;quot;
src/liballoc/string.rs
262:pub struct String {
...
...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Okay, &lt;code&gt;String&lt;/code&gt; is defined in &lt;code&gt;liballoc&lt;/code&gt;, which is kind of weird? But alright.&lt;/p&gt;
&lt;p&gt;We enter the file and search for &lt;code&gt;fn contains&lt;/code&gt;, but nothing shows up.
Strange, isn&apos;t it listed under &lt;code&gt;String&lt;/code&gt; in the docs?
After scrolling up 21 methods in the docs, we can find our issue:
&lt;code&gt;Methods from Deref&amp;lt;Target = str&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Yes, of course. The function is not a &lt;code&gt;String&lt;/code&gt; method, but a &lt;code&gt;str&lt;/code&gt; method
(we know the difference between &lt;code&gt;String&lt;/code&gt; and &lt;code&gt;&amp;amp;str&lt;/code&gt;, but what was &lt;code&gt;str&lt;/code&gt; again?
Oh, maybe &lt;code&gt;str&lt;/code&gt; becomes &lt;code&gt;&amp;amp;str&lt;/code&gt; in &lt;code&gt;(&amp;amp;self)&lt;/code&gt; methods).
&lt;code&gt;rg &amp;quot;struct str&amp;quot;&lt;/code&gt; and &lt;code&gt;rg &amp;quot;struct Str &amp;quot;&lt;/code&gt; gives us nothing.
No worries, we have fuzzy file search in our editor.
Besides, &lt;code&gt;str&lt;/code&gt; sounds fundamental enough that it should be in &lt;code&gt;libcore&lt;/code&gt;.
And we do find &lt;code&gt;libcore/str/mod.rs&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Again we search for &lt;code&gt;fn contains&lt;/code&gt;, and now we get three matches:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;fn contains_nonascii(x: usize) -&amp;gt; bool {

...

/// Methods for string slices
pub trait StrExt {
    // NB there are no docs here are they&apos;re all located on the StrExt trait in
    // liballoc, not here.

    #[stable(feature = &amp;quot;core&amp;quot;, since = &amp;quot;1.6.0&amp;quot;)]
    fn contains&amp;lt;&apos;a, P: Pattern&amp;lt;&apos;a&amp;gt;&amp;gt;(&amp;amp;&apos;a self, pat: P) -&amp;gt; bool;

...

#[stable(feature = &amp;quot;core&amp;quot;, since = &amp;quot;1.6.0&amp;quot;)]
impl StrExt for str {
    #[inline]
    fn contains&amp;lt;&apos;a, P: Pattern&amp;lt;&apos;a&amp;gt;&amp;gt;(&amp;amp;&apos;a self, pat: P) -&amp;gt; bool {
        pat.is_contained_in(self)
    }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here they are. The function is generic over &lt;code&gt;Pattern&lt;/code&gt;.
What is a &lt;code&gt;Pattern&lt;/code&gt; anyways? If we read the docs of &lt;code&gt;contains&lt;/code&gt;, we clearly see that &lt;code&gt;Pattern&lt;/code&gt; is the argument,
even though the examples would suggest that the argument is a &lt;code&gt;&amp;amp;str&lt;/code&gt;. So we click on &lt;a href=&quot;https://doc.rust-lang.org/std/str/pattern/trait.Pattern.html&quot;&gt;&lt;code&gt;Pattern&lt;/code&gt;&lt;/a&gt;.
Aha, it is just an abstraction that allows us to use different types as the pattern -
both &lt;code&gt;char&lt;/code&gt;, &lt;code&gt;String&lt;/code&gt;, &lt;code&gt;&amp;amp;str&lt;/code&gt;, and more&lt;sup&gt;&lt;a href=&quot;#user-content-fn-pattern-f&quot; id=&quot;user-content-fnref-pattern-f&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.
Personally I would think that searching for a &lt;code&gt;char&lt;/code&gt; and matching a &lt;code&gt;&amp;amp;str&lt;/code&gt; are rather different problems&lt;sup&gt;&lt;a href=&quot;#user-content-fn-char-str-diff&quot; id=&quot;user-content-fnref-char-str-diff&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;,
but maybe this turned out to be a convenient way to handle Rusts lack of function overloading.&lt;/p&gt;
&lt;p&gt;Back to &lt;code&gt;contains&lt;/code&gt;. &lt;code&gt;Pattern::is_contained_in&lt;/code&gt; is called.
Maybe this is used to allow the types that implements &lt;code&gt;Pattern&lt;/code&gt; to choose how they want to search themselves.
Sounds reasonable, since we then are in the same situation as if we had function overloading.
We are mostly concerned about &lt;code&gt;&amp;amp;str&lt;/code&gt; (or &lt;code&gt;String&lt;/code&gt;, or &lt;code&gt;str&lt;/code&gt;?).&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;pub trait Pattern&amp;lt;&apos;a&amp;gt;: Sized {
    ...
    /// Checks whether the pattern matches anywhere in the haystack
    #[inline]
    fn is_contained_in(self, haystack: &amp;amp;&apos;a str) -&amp;gt; bool {
        self.into_searcher(haystack).next_match().is_some()
    }
    ...
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So we make the pattern into a &lt;code&gt;Searcher&lt;/code&gt;, and the haystack is the text we are searching in.
The searcher seemingly iterates over all matches, but we are only interested if it is there at all,
so we take the first and see if we got something.&lt;/p&gt;
&lt;p&gt;We search for &lt;code&gt;Searcher&lt;/code&gt; (a little ironic, don&apos;t you think?), and get &lt;a href=&quot;https://doc.rust-lang.org/std/str/pattern/trait.Searcher.html&quot;&gt;this&lt;/a&gt;, in code form.
We are getting closer.
&lt;code&gt;next_match&lt;/code&gt; seems alright: if we get a match between &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;, we got a match. If we are done without getting a match,
we didn&apos;t get a match. Otherwise, we continue to call &lt;code&gt;next&lt;/code&gt; (it is not really clear what the remaining case is,
but maybe we get some information about mismatches).
So what does &lt;code&gt;next&lt;/code&gt; do? And where is its implementation for &lt;code&gt;&amp;amp;str&lt;/code&gt;?
Let&apos;s search for &lt;code&gt;&amp;amp;str&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;/////////////////////////////////////////////////////////////////////////////
// Impl for &amp;amp;str
/////////////////////////////////////////////////////////////////////////////

/// Non-allocating substring search.
///
/// Will handle the pattern `&amp;quot;&amp;quot;` as returning empty matches at each character
/// boundary.
impl&amp;lt;&apos;a, &apos;b&amp;gt; Pattern&amp;lt;&apos;a&amp;gt; for &amp;amp;&apos;b str {
    type Searcher = StrSearcher&amp;lt;&apos;a, &apos;b&amp;gt;;

    #[inline]
    fn into_searcher(self, haystack: &amp;amp;&apos;a str) -&amp;gt; StrSearcher&amp;lt;&apos;a, &apos;b&amp;gt; {
        StrSearcher::new(haystack, self)
    }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Oh, okay so this is how we got the implementor of &lt;code&gt;Searcher&lt;/code&gt; in the first place.
&lt;a href=&quot;https://doc.rust-lang.org/src/core/str/pattern.rs.html#542&quot;&gt;Here&lt;/a&gt; we find &lt;code&gt;StrSearcher&lt;/code&gt;, which sounds promising.
The struct has members, there is an &lt;code&gt;enum&lt;/code&gt; here, and another struct with something &lt;code&gt;fw&lt;/code&gt; and &lt;code&gt;bw&lt;/code&gt; (forwards and backwards?).
No need to worry, we can try to understand all this stuff when it comes up.
Let us look at the &lt;code&gt;Searcher&lt;/code&gt; implementation.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;unsafe impl&amp;lt;&apos;a, &apos;b&amp;gt; Searcher&amp;lt;&apos;a&amp;gt; for StrSearcher&amp;lt;&apos;a, &apos;b&amp;gt; {
    ...
    fn next(&amp;amp;mut self) -&amp;gt; SearchStep {
        match self.searcher {
            StrSearcherImpl::Empty(ref mut searcher) =&amp;gt; {
                // empty needle rejects every char and matches every empty string between them
                let is_match = searcher.is_match_fw;
                searcher.is_match_fw = !searcher.is_match_fw;
                let pos = searcher.position;
                match self.haystack[pos..].chars().next() {
                    _ if is_match =&amp;gt; SearchStep::Match(pos, pos),
                    None =&amp;gt; SearchStep::Done,
                    Some(ch) =&amp;gt; {
                        searcher.position += ch.len_utf8();
                        SearchStep::Reject(pos, searcher.position)
                    }
                }
            }
            StrSearcherImpl::TwoWay(ref mut searcher) =&amp;gt; {
                // TwoWaySearcher produces valid *Match* indices that split at char boundaries
                // as long as it does correct matching and that haystack and needle are
                // valid UTF-8
                // *Rejects* from the algorithm can fall on any indices, but we will walk them
                // manually to the next character boundary, so that they are utf-8 safe.
                if searcher.position == self.haystack.len() {
                    return SearchStep::Done;
                }
                let is_long = searcher.memory == usize::MAX;
                match searcher.next::&amp;lt;RejectAndMatch&amp;gt;(self.haystack.as_bytes(),
                                                      self.needle.as_bytes(),
                                                      is_long)
                {
                    SearchStep::Reject(a, mut b) =&amp;gt; {
                        // skip to next char boundary
                        while !self.haystack.is_char_boundary(b) {
                            b += 1;
                        }
                        searcher.position = cmp::max(b, searcher.position);
                        SearchStep::Reject(a, b)
                    }
                    otherwise =&amp;gt; otherwise,
                }
            }
        }
    }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We start of by matching on &lt;code&gt;self.searcher&lt;/code&gt;, which is either &lt;code&gt;Empty&lt;/code&gt; or &lt;code&gt;TwoWay&lt;/code&gt; (what about &lt;code&gt;OneWay&lt;/code&gt;?),
and by the comment in the first case, we understand what is happening: &lt;code&gt;StrSearchImpl::Empty&lt;/code&gt; is actually an
empty pattern (this is confirmed by &lt;code&gt;StrSearcher::new&lt;/code&gt; above).
Not a very interesting case for us, so we move on to &lt;code&gt;TwoWay&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;First we check if we are &lt;code&gt;Done&lt;/code&gt;. If so, we return &lt;code&gt;Done&lt;/code&gt;.
Then we check something about &lt;code&gt;searcher.memory&lt;/code&gt;, but it is not clear what &lt;code&gt;memory&lt;/code&gt; is,
so maybe we should check that out. We find &lt;code&gt;struct TwoWaySearcher&lt;/code&gt; (which is the type that &lt;code&gt;TwoWay&lt;/code&gt; contains),
and lo and behold: &lt;a href=&quot;https://doc.rust-lang.org/src/core/str/pattern.rs.html#764&quot;&gt;A comment&lt;/a&gt; describing the algorithm!
Well, some background information anyways, but the code in &lt;code&gt;TwoWaySearcher&lt;/code&gt;, which turns out to
be the place where the &lt;em&gt;real&lt;/em&gt; stuff happens, is well documented.
Natually, the algorithm is rather convoluted (hard problems are hard --- who knew?), but we found out what we wanted to.&lt;/p&gt;
&lt;p&gt;Let us try to sum up our journey.
We wanted to know which string searching algorithm &lt;code&gt;String::contains&lt;/code&gt; was.
This method is from &lt;code&gt;str&lt;/code&gt;, as &lt;code&gt;String&lt;/code&gt; &lt;code&gt;Deref&lt;/code&gt;s to &lt;code&gt;str&lt;/code&gt;, and we would like to call &lt;code&gt;contains&lt;/code&gt; on strings we don&apos;t own.
Then our search string becomes a &lt;code&gt;Pattern&lt;/code&gt; which we transform into a &lt;code&gt;Searcher&lt;/code&gt;, which takes a haystack,
which is our original text, and this searcher does the searching.
Simple, right?&lt;/p&gt;
&lt;h1&gt;So what is the point?&lt;/h1&gt;
&lt;p&gt;I think that this journey was not trivial.
We have skimmed &lt;em&gt;a lot&lt;/em&gt; of code (I have, anyways), jumped between files and modules,
read inline comments and markdown docs, and finally, at the bottom of the rabbit hole,
we actually found out what we wanted.
Why does something seemingly so simple have to be behind so many layers&lt;sup&gt;&lt;a href=&quot;#user-content-fn-ogre&quot; id=&quot;user-content-fnref-ogre&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;?&lt;/p&gt;
&lt;p&gt;Some of these layers have benefits --- there is no doubt about that.
Maybe we &lt;em&gt;would&lt;/em&gt; like to write &lt;code&gt;s.contains(&apos;a&apos;)&lt;/code&gt;, &lt;code&gt;s.contains(&amp;quot;ab&amp;quot;)&lt;/code&gt; or &lt;code&gt;|s: &amp;amp;[char]| &amp;quot;ayyy&amp;quot;.contains(s)&lt;/code&gt;,
and since we don&apos;t have function overloading, we need &lt;code&gt;Pattern&lt;/code&gt; to abstract over the variations.
Maybe we would like to have a common argument type for &lt;code&gt;&amp;amp;str&lt;/code&gt; methods: &lt;code&gt;&amp;amp;str&lt;/code&gt; have 20 methods that takes a &lt;code&gt;Pattern&lt;/code&gt;,
including &lt;code&gt;contains&lt;/code&gt;, &lt;code&gt;find&lt;/code&gt;, &lt;code&gt;split&lt;/code&gt;, and &lt;code&gt;replace&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;But all of the layers? If we list all &lt;code&gt;struct&lt;/code&gt;s, &lt;code&gt;enum&lt;/code&gt;s, and &lt;code&gt;Trait&lt;/code&gt;s one needs to understand in order to dig through something
rather simple like this&lt;sup&gt;&lt;a href=&quot;#user-content-fn-simple&quot; id=&quot;user-content-fnref-simple&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;, can we explain why they all have to be there?
Are some of them there because we &lt;em&gt;might&lt;/em&gt; need them in the future&lt;sup&gt;&lt;a href=&quot;#user-content-fn-future-proof&quot; id=&quot;user-content-fnref-future-proof&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;?
Is it possible that we made this more convoluted than strictly needed?
I don&apos;t claim to have an answer to any of these questions, but I think
these are important questions, and I do not think they are asked often enough.&lt;/p&gt;
&lt;p&gt;Of course, this is not to say that the &lt;code&gt;str&lt;/code&gt; module is overengineered, or that I think there is anything wrong with this implementation.
The only reason I brought it up as an example is because &lt;em&gt;I did&lt;/em&gt; try to find out how &lt;code&gt;contains&lt;/code&gt; works some months ago,
but I gave up because I could not understand the whole system (admittedly I didn&apos;t spend too much time on it).
There were simply too much stuff!
It is ironic that we can simplify a system with abstractions so far as to end up with an even more complex system.&lt;/p&gt;
&lt;p&gt;We like short functions&lt;sup&gt;&lt;a href=&quot;#user-content-fn-why-short&quot; id=&quot;user-content-fnref-why-short&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;, and we like to introduce types to ensure type safety&lt;sup&gt;&lt;a href=&quot;#user-content-fn-stringify-api&quot; id=&quot;user-content-fnref-stringify-api&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;.
We like flexible solutions, and generalized interfaces.
But it is easy to overlook what we are giving up by building a tower of abstractions, namely simplicity.&lt;/p&gt;
&lt;p&gt;I think simplicity is something we, as developers, should strive for, and I think it is often something that is forgotten.
I &lt;em&gt;don&apos;t&lt;/em&gt; think that there is an inherit tradeoff between complex and flexible, and simple but inflexible.
I think we can get both.
But, as always, the best solution is the hardest to find.&lt;/p&gt;
&lt;hr /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.reddit.com/r/programming/comments/6il0w7/how_much_abstraction_is_too_much/&quot;&gt;/r/programming thread&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.reddit.com/r/rust/comments/6il0vx/how_much_abstraction_is_too_much/&quot;&gt;/r/rust thread&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://news.ycombinator.com/item?id=14602567&quot;&gt;HN thread&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-npc&quot;&gt;
&lt;p&gt;And don&apos;t let your algorithm professor/friend/family relative tell you otherwise! &lt;a href=&quot;#user-content-fnref-npc&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-docs&quot;&gt;
&lt;p&gt;Documentation is a great tool here, but docs might be (1) misleading, (2) out of date, (3) lacking, or (4) non-existent. &lt;a href=&quot;#user-content-fnref-docs&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-pattern-f&quot;&gt;
&lt;p&gt;... and &lt;code&gt;impl&amp;lt;&apos;a, F&amp;gt; Pattern&amp;lt;&apos;a&amp;gt; for F where F: FnMut(char) -&amp;gt; bool&lt;/code&gt;? This is seemingly the same as &lt;code&gt;s.chars().any(f)&lt;/code&gt;. &lt;a href=&quot;#user-content-fnref-pattern-f&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-char-str-diff&quot;&gt;
&lt;p&gt;In the implementation sense, not the conceptual sense. &lt;a href=&quot;#user-content-fnref-char-str-diff&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-ogre&quot;&gt;
&lt;p&gt;Is &lt;code&gt;str::contains&lt;/code&gt; actually &lt;a href=&quot;https://www.youtube.com/watch?v=_bMcXVe8zIs&quot;&gt;an orge&lt;/a&gt;? &lt;a href=&quot;#user-content-fnref-ogre&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-simple&quot;&gt;
&lt;p&gt;You might think &amp;quot;Aha, but this &lt;em&gt;isn&apos;t&lt;/em&gt; rather simple, because such and such&amp;quot;, but I do think that something like this &lt;em&gt;should&lt;/em&gt; be simple. I did not want to understand &lt;em&gt;how&lt;/em&gt; the algorithm works, I just wanted to &lt;em&gt;find&lt;/em&gt; it! &lt;a href=&quot;#user-content-fnref-simple&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-future-proof&quot;&gt;
&lt;p&gt;In which case why are they there &lt;em&gt;today&lt;/em&gt;? &lt;a href=&quot;#user-content-fnref-future-proof&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-why-short&quot;&gt;
&lt;p&gt;Although &lt;em&gt;why&lt;/em&gt; functions and methods should be short is not always explained. &lt;a href=&quot;#user-content-fnref-why-short&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-stringify-api&quot;&gt;
&lt;p&gt;For instance, see Pascal Hertleif&apos;s talk &lt;a href=&quot;https://www.youtube.com/watch?v=0zOg8_B71gE&amp;amp;t=408s&quot;&gt;Writing Idiomatic Libraries in Rust&lt;/a&gt; [10:38] &lt;a href=&quot;#user-content-fnref-stringify-api&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Recursion</title><id>https://mht.wtf/post/recursion/</id><updated>2016-04-10T14:50:41+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/recursion/" rel=""/><link href="https://mht.wtf/post/recursion/index.html" rel="alternate"/><published>2016-04-10T14:50:41+01:00</published><content type="text/html">&lt;p&gt;So, what is recursion&lt;sup&gt;&lt;a href=&quot;#user-content-fn-recursion-fail&quot; id=&quot;user-content-fnref-recursion-fail&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; anyways?&lt;/p&gt;
&lt;h2&gt;Mathematical Induction&lt;/h2&gt;
&lt;p&gt;Explaining recursion might be easier if we understand mathematical &lt;em&gt;induction&lt;/em&gt;.
Indunction is a proof--technique, which works somewhat like this&lt;sup&gt;&lt;a href=&quot;#user-content-fn-ghetto-induction&quot; id=&quot;user-content-fnref-ghetto-induction&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Show that we can get from one state to the next state&lt;/li&gt;
&lt;li&gt;Show that we have an initial state&lt;/li&gt;
&lt;li&gt;We can now get to any state, after the initial state.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The concept can be visalized as climbing a staircase. If we are on an arbitrary step, we know how to get to the next step.
We also know how to get to the 0th step, which (i guess) is the ground.
From this, we conclude that we can get to step &lt;code&gt;n&lt;/code&gt;, for any positive &lt;code&gt;n&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;What does this have to do with recursion? Well, recursion is kind of the opposite.
With recursion, we need the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The problem can be solved by first solving a smaller instance of the same problem&lt;/li&gt;
&lt;li&gt;When the input is small enough, it is trivial to solve&lt;/li&gt;
&lt;li&gt;We can now solve an instance of any size, simply by making the problem smaller enough times.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Recursion is heavily used in functional programming, so if you are familiar with, say a &lt;code&gt;Lisp&lt;/code&gt;, or &lt;code&gt;Haskell&lt;/code&gt;,
this might be familiar.&lt;/p&gt;
&lt;p&gt;For instance, we can find the length of a list using recursion:
the length of a list is one more then the length of the same list, with the first element removed,
and the length of an empty list is 0.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-py&quot;&gt;num list_length(List)
    if List == []
        return 0
    let head be first element
    let tail be rest of list
    return 1 + list_length(tail)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or, in actual python:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-py&quot;&gt;def list_len(lst):
  if lst == []:
    return 0
  return 1 + list_len(lst[1:])
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;A Not So Straight Forward Recursion&lt;/h2&gt;
&lt;p&gt;Let&apos;s say we&apos;re trying to sort a list. We could to a lot of different things, like keeping a sorted list, and inserting one new element at a time, or trying to build a data structure, with which extracting a sorted list is trivial&lt;sup&gt;&lt;a href=&quot;#user-content-fn-heapsort&quot; id=&quot;user-content-fnref-heapsort&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.
But we&apos;ll try something else. We can begin with the observation that if we split the list in two, and somehow manage to sort the two list separately,
we can merge the lists, pretty easily: just take out the smallest element of the two in front of the lists, and push it at the end onto a new list.
Then the new list will consist of all of the elements in sorted order, because we allways took the smaller element.&lt;/p&gt;
&lt;p&gt;So great, we just found a way to sort a list... except we didn&apos;t, because in order for our algorithm to work, we need an additional algorithm to sort the two lists.
But, do we really? The algorithm we just made is itself a sorting algorithm! What happends if we try to use the algorithm itself?&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-py&quot;&gt;list merge_sort(List)
    split List at middle into A, B
    merge_sort(A)
    merge_sort(B)
    return merge(A, B)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Does this work? What happends if &lt;code&gt;List&lt;/code&gt; is empty? Or contains one element? Actually, when the length of the list is less than two, the
list is already sorted (if we can say that an empty list is sorted).&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-py&quot;&gt;list merge_sort(List)
    if List.length &amp;lt; 2
        return List
    split List at middle into A, B
    merge_sort(A)
    merge_sort(B)
    return merge(A, B)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Ok, we are (perhaps) starting to build up confidence in our new, albeit weird, algorithm.
But, this call to the function itself looks a little dangerous. How can we know if it will end?&lt;/p&gt;
&lt;p&gt;We first observe that if the length of the list is less than two, we return the list, so the algorithm returns from that call.
If the length of the list is larger or equal to two, we split the list at the middle, and call the parts &lt;code&gt;A&lt;/code&gt; and &lt;code&gt;B&lt;/code&gt;.
But the length of both &lt;code&gt;A&lt;/code&gt;  and &lt;code&gt;B&lt;/code&gt; is the half of &lt;code&gt;List&lt;/code&gt; (or something very close to half).
Neither of them can possibly be larger, or even of equal length.
Hence, we know that the algorithm calls itself, but with a smaller input.
Eventually, when the input it small enough --- containing less than two elements --- we return.
Hence, no matter what size the input is, the algorithm always terminates!&lt;/p&gt;
&lt;p&gt;Now, this result is kind of amazing.
We have just constructed a sorting algorithm, kind of without even thinking about the problem of sorting!
By assuming our own algorithm actually works, we gained confidence that the same algorithm works!&lt;/p&gt;
&lt;p&gt;The merge step looks a bit scary, though&lt;sup&gt;&lt;a href=&quot;#user-content-fn-merge-step&quot; id=&quot;user-content-fnref-merge-step&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. Maybe we can try something similar, but without having to merge the lists.&lt;/p&gt;
&lt;p&gt;Ok, what if we sort the list just a little bit, so that we can make the recursion work.
For instance, we can take a element from the list and then split, or &lt;em&gt;partition&lt;/em&gt;, the list in two, based on that one element,
such that all elements in the first list are less than or equal to our selected element, and all elements in the last list are greater than it.
At the end, we simply put them together, and put our special element, the &lt;em&gt;pivot&lt;/em&gt;, in betweeen.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-py&quot;&gt;list sort(List)
    if List.length &amp;lt; 2
        return List
    select e from List
    make A such that a in A --&amp;gt; a &amp;lt; e
    sort(A)
    make B such that b in B --&amp;gt; b &amp;gt; e
    sort(B)
    return A + [e] + B
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This might look more like sorting, because we partition the list into two, based on the pivot.
But one would think there is still lots to do. There is not.
This &lt;code&gt;python&lt;/code&gt; function implements this algorithm, &lt;code&gt;quick-sort&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-py&quot;&gt;def sort(lst):
  if len(lst) &amp;lt; 2:
    return lst
  pivot = lst.pop()
  A = [a for a in lst if a &amp;lt;= pivot]
  B = [b for b in lst if b &amp;lt;= pivot]
  return sort(A) + [pivot] + sort(B)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and this is the &lt;code&gt;haskell&lt;/code&gt; version:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-hs&quot;&gt;sort (x:xs) = l ++ [x] ++ g
            where l = sort [a | a&amp;lt;-xs, a &amp;lt;= x]
                  g = sort [b | b&amp;lt;-xs, b &amp;gt; x]
sort l = l
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Recursion Gone Wrong&lt;/h2&gt;
&lt;p&gt;Recursion is an amazing tool, and sometimes it works out just right.
However, there are multiple things that can go wrong&lt;sup&gt;&lt;a href=&quot;#user-content-fn-can-go-wrong&quot; id=&quot;user-content-fnref-can-go-wrong&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Make Sure It Ends&lt;/h3&gt;
&lt;p&gt;It is very easy to get burnt on this, either by looping forever&lt;sup&gt;&lt;a href=&quot;#user-content-fn-negative-fac&quot; id=&quot;user-content-fnref-negative-fac&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-py&quot;&gt;def fac(n):
    if n == 1:
        return 1
    return n * fac(n - 1)

# fac(-2) loops infinitely
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or incorrectly handling, or even forgetting, the base case&lt;sup&gt;&lt;a href=&quot;#user-content-fn-sort-fail&quot; id=&quot;user-content-fnref-sort-fail&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;, and result in a crash:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-hs&quot;&gt;sort [x] = [x]
sort (x:xs) = l ++ [x] ++ g
            where l = sort [a | a&amp;lt;-xs, a &amp;lt;= x]
                  g = sort [b | b&amp;lt;-xs, b &amp;gt; x]
-- Non-exhaustive patterns:
-- the empty list case is not handled
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As mentioned here&lt;sup&gt;&lt;a href=&quot;#user-content-fn-recursion-fail&quot; id=&quot;user-content-fnref-recursion-fail-2&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, making sure the recursion ends is tricky.&lt;/p&gt;
&lt;p&gt;If we have multiple recursive calls it is also easy to do one thing over and over using recursion.
The classical example here is the recursive &lt;code&gt;fibonacci&lt;/code&gt; function&lt;sup&gt;&lt;a href=&quot;#user-content-fn-wikinacci&quot; id=&quot;user-content-fnref-wikinacci&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;, which runs in exponential time.
Try to run &lt;code&gt;fib(35)&lt;/code&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-fib-40&quot; id=&quot;user-content-fnref-fib-40&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-py&quot;&gt;def fib(n):
    if n &amp;lt; 3:
        return 1
    return fib(n-1) + fib(n-2)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A solution to this is &lt;em&gt;memoization&lt;/em&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-memoization&quot; id=&quot;user-content-fnref-memoization&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;, where we save intermediate results, and look up values instead of recomputing them.
Of course, in the case on calculating fibonacci numbers, this is still worse than an iterative solution, because we are using more memory.
This is not to say that it is not possible to write a equally good implementation, while still using recursion; we just have to go upwards, instead of downwards:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-py&quot;&gt;def fib(n):
    def go(current, previous, a):
        if a == n:
            return current
        return go(current + previous, current, a + 1)
    if n &amp;lt; 3:
        return 1
    return go(1, 0, 1)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At this point, one could argue that this is basically a loop&lt;sup&gt;&lt;a href=&quot;#user-content-fn-tail-recursive&quot; id=&quot;user-content-fnref-tail-recursive&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;11&lt;/a&gt;&lt;/sup&gt;, so there is no point in using recursion; we might be better off with simply writing&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-py&quot;&gt;def fib(n):
    current = 1
    previous = 0
    for _ in range(n - 1):
      current, previous =  current + previous, current
    return current
&lt;/code&gt;&lt;/pre&gt;
&lt;hr /&gt;
&lt;p&gt;Hopefully, we have gained a little insight in recursion --- both its magic and its dangers.
While there might be tricky to get recursion right, actually getting it right is too much fun to miss out on.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-recursion-fail&quot;&gt;
&lt;p&gt;While trying to make the &lt;em&gt;Recursion: See Recursion&lt;/em&gt; joke, the static site generator ended up crashing due to a never ending recursion so that the call stack became too large. &lt;a href=&quot;#user-content-fnref-recursion-fail&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt; &lt;a href=&quot;#user-content-fnref-recursion-fail-2&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-ghetto-induction&quot;&gt;
&lt;p&gt;This is not a rigid, and probably not a very good, intro to induction --- but that is fine, because this post is about recursion. &lt;a href=&quot;#user-content-fnref-ghetto-induction&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-heapsort&quot;&gt;
&lt;p&gt;This is my personal favorite. &lt;a href=&quot;#user-content-fnref-heapsort&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-merge-step&quot;&gt;
&lt;p&gt;It&apos;s not, but it is easy to mess up the merge routine. &lt;a href=&quot;#user-content-fnref-merge-step&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-can-go-wrong&quot;&gt;
&lt;p&gt;These problems aren&apos;t unique to recursion --- both infinite loops and horrible running times are naturally possible without using recursion. &lt;a href=&quot;#user-content-fnref-can-go-wrong&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-negative-fac&quot;&gt;
&lt;p&gt;You could argue that the factorial of a negative number is not well defined, so that calling &lt;code&gt;fac(-2)&lt;/code&gt; does not make any sense. However, defining the factorial of a negative number is useful in many situations. &lt;a href=&quot;#user-content-fnref-negative-fac&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-sort-fail&quot;&gt;
&lt;p&gt;This actually was the version I wrote when trying to write the correct version above. &lt;a href=&quot;#user-content-fnref-sort-fail&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-wikinacci&quot;&gt;
&lt;p&gt;We will be using the same definition as on &lt;a href=&quot;https://en.wikipedia.org/wiki/Fibonacci_number&quot;&gt;wikipedia&lt;/a&gt;, where &lt;code&gt;fib(1) = fib(2) = 1&lt;/code&gt;. &lt;a href=&quot;#user-content-fnref-wikinacci&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-fib-40&quot;&gt;
&lt;p&gt;... or &lt;code&gt;fib(50)&lt;/code&gt;, if you have a lot of time on your hands. &lt;a href=&quot;#user-content-fnref-fib-40&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-memoization&quot;&gt;
&lt;p&gt;Note that there is no r in memoization. &lt;a href=&quot;#user-content-fnref-memoization&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-tail-recursive&quot;&gt;
&lt;p&gt;The function is &lt;em&gt;Tail--Recursive&lt;/em&gt;, so in many languages this would be transformed by the compiler to a loop. However, Python is not one of those languages. &lt;a href=&quot;#user-content-fnref-tail-recursive&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Advent of Common Lisp, Day 5-9</title><id>https://mht.wtf/post/advent-2018-2/</id><updated>2018-12-07T19:33:42+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/advent-2018-2/" rel=""/><link href="https://mht.wtf/post/advent-2018-2/index.html" rel="alternate"/><published>2018-12-07T19:33:42+01:00</published><content type="text/html">&lt;p&gt;We continue our Common Lisp adventure!&lt;/p&gt;
&lt;h2&gt;Day 5&lt;/h2&gt;
&lt;h3&gt;Part 1&lt;/h3&gt;
&lt;p&gt;It feels natural to first figure out how to find an reduce a pair of chars.  A
solution based on arrays might be easier to implement without having to loop
through the input line multiple times, since we risk the situation that
removing one pair of chars may make the new neighbors be reducable, like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;abcdDCBA
 abcCBA
  abBA
   aA
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With the array based solution we can just decrement the current index after
reducing a char pair. This makes for plenty of problems: we first want to
destructively remove a part of a string given two indices, and we also want a
loop in which we can alter the iterating value; neither of which seems straight
forward to do in CL. My first, non-working attempt looked like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun reducable (a b)
  (and (eq (char-downcase a) (char-downcase b))
       (not (eq a b))))

(defun reduce-chars (chars)
  (let ((end (- (length chars) 1)))
    (loop for i from 0 below (- (length chars) 1)
          when (or (&amp;gt; 0 i) (&amp;lt; end i) return &amp;quot;&amp;quot;
          when (reducable (char chars i) (char chars (+ i 1)))
            do (progn
                 (setf chars (remove-if #&apos;(lambda (_x) t) chars :start i :end (+ i 2)))
                 (setf i (- i 2)))
          finally (return chars)))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Having &lt;code&gt;#&apos;(lambda (_x) t)&lt;/code&gt; as the removal predicate is... not great.  This was
the first attempt in which I figured &amp;quot;OK, this might just work&amp;quot;.  However, it
turns out that the termination check is not ran on each iteration in &lt;code&gt;loop&lt;/code&gt;: it
seems to be the case that &lt;code&gt;(- (length chars) 1)&lt;/code&gt; is only evaluated once before
the first iteration of the loop, as opposed to how &lt;code&gt;for (int i = 0; i &amp;lt; length(foo); i++)&lt;/code&gt; works in eg. C. This makes for out-of-bounds indexing in the
body.&lt;/p&gt;
&lt;p&gt;As a note, there is a destructive version of &lt;code&gt;remove&lt;/code&gt; called &lt;code&gt;delete&lt;/code&gt;, which
would seemingly allow me to skip the &lt;code&gt;(setf &lt;/code&gt; stuff. This did not work, since
&lt;code&gt;delete&lt;/code&gt; for some reason would not update the length of the string, and I would
end up with repeating the last two chars.  That is, this happend:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;abcCBA
abBABA
aABABA
BABABA
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I eventually figured out a solution, and ended up with this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun reduce-chars (chars)
  (loop for i from 0
        for end = (- (length chars) 2)
        when (or (&amp;gt; 0 i) (&amp;lt; end i)) return chars
        when (reducable (char chars i) (char chars (+ i 1)))
            do (progn
                 (setf chars (remove-if #&apos;(lambda (_x) t) chars :start i :end (+ i 2)))
                 (setf i (- i 2)))
        finally (return chars)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We run it on the input and ... we get 49998. Strange. After printing out &lt;code&gt;i&lt;/code&gt; in
the &lt;code&gt;progn&lt;/code&gt; we see that the first index matches, and so &lt;code&gt;i&lt;/code&gt; gets subtracted to
&lt;code&gt;-2&lt;/code&gt;, causing the loop to terminate on the next iteration. Ok, we fix this by
adding a &lt;code&gt;(max 0 ..)&lt;/code&gt; after subtracting &lt;code&gt;2&lt;/code&gt;. We run again, aaand .... wrong
answer. The resulting string is slightly above 10000 chars, so looking at the
output manually  will probably take too much time. We try to run it twice on
the same input, which should not change anything, and yet we get&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (length (reduce-chars (reduce-chars *input-5*)))
10768
* (length (reduce-chars (reduce-chars *input-5*)))
10766
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A little thinking reveals the bug: since &lt;code&gt;i&lt;/code&gt; is incremented after the loop body
we should not &lt;code&gt;max&lt;/code&gt; it to &lt;code&gt;0&lt;/code&gt;, but  to &lt;code&gt;-1&lt;/code&gt;. This fixes the bug, and solves
part 1.&lt;/p&gt;
&lt;h3&gt;Part 2&lt;/h3&gt;
&lt;p&gt;The second part asks us to remove all occurences of one letter such that the reduced string is as short as possible.
The simplest way to do this is just to try out all possible letter choices:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-5/2 (input)
  (let ((all-chars (remove-duplicates input :key #&apos;char-downcase )))
    (loop for c across all-chars
          for inp = (remove-if #&apos;(lambda (ch) (or (eq c ch) (eq (char-upcase c) ch))) input)
          minimizing (length (reduce-chars inp)) into l
          finally (return l))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There&apos;s definitely room for optimization here:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt; * (time (day-5/2 *input-5*))
Evaluation took:
  104.868 seconds of real time
  104.697084 seconds of total run time (104.137960 user, 0.559124 system)
  [ Run times consist of 2.633 seconds GC time, and 102.065 seconds non-GC time. ]
  99.84% CPU
  304,535,002,262 processor cycles
  57,568,946,208 bytes consed
  
6538
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By using a list instead of an array, and swinging &lt;code&gt;cdr&lt;/code&gt; to &lt;code&gt;cdddr&lt;/code&gt; when we want
to remove a pair of chars we significantly cut down on running time (this is
part 1 again):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (time (length (reduce-chars *input-5*)))
Evaluation took:
  4.307 seconds of real time
  4.305834 seconds of total run time (4.289201 user, 0.016633 system)
  [ Run times consist of 0.112 seconds GC time, and 4.194 seconds non-GC time. ]
  99.98% CPU
  12,506,949,589 processor cycles
  2,383,071,584 bytes consed
  
10766
* (time (day-5/1-list *input-5*))
Evaluation took:
  0.106 seconds of real time
  0.106221 seconds of total run time (0.106221 user, 0.000000 system)
  100.00% CPU
  309,274,661 processor cycles
  819,200 bytes consed
  
10766
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we now use this solution for the second part, we get the following running time:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (time (day-5/2-list *input-5*))
Evaluation took:
  2.496 seconds of real time
  2.494232 seconds of total run time (2.494232 user, 0.000000 system)
  99.92% CPU
  7,248,068,477 processor cycles
  28,657,200 bytes consed
  
6538
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Not bad!&lt;/p&gt;
&lt;h2&gt;Day 6&lt;/h2&gt;
&lt;p&gt;There might be fancy tricks to this task, but we&apos;ll try out the simplest
approach first: we make the grid (and hope that the input isn&apos;t too big!), loop
over each cell, find the closest point, and stores that in the cell. Afterwards
we go through the borders and find all of the areas that touch it, because
these areas will have inifinite area.  At last we just sum up the number of
cells for each area, and choose the maximum, excluding the infinite ones.&lt;/p&gt;
&lt;p&gt;This time I would like to try out propper top-down programming. This will be
our final function:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-6/1 (input)
  (let* ((points (parse-points input))
         (grid (make-grid points)))
    (mark-closest grid points)
    (let* ((infinites (get-border-areas grid))
           (area-sizes (count-area-sizes grid))
           (valids (exclude-infinites area-sizes infinites))
           (second (car (sort valids #&apos;second)))))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now it&apos;s just a matter of filling in the blanks.&lt;/p&gt;
&lt;p&gt;First with input parsing. The lines are in the format &amp;quot;&lt;x&gt;, &lt;y&gt;&amp;quot;, but regex
seemes like overkill, so I figured I&apos;d try out &lt;code&gt;split-sequence&lt;/code&gt;. I could,
however, not get it to install, so instead I went with the simpler solution:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defstruct point x y id)
(defparameter *point-count* 0)
(defun pt (x y) (make-point :x x :y y :id (incf *point-count*)))

(defun parse-points (lines)
  (loop for line in lines
        for i = (search &amp;quot;, &amp;quot; line)
        collect (make-point :x (parse-integer (subseq line 0 i))
                            :y (parse-integer (subseq line (+ 2))))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With a flash of clairvoyance we realize that we could use an &lt;code&gt;id&lt;/code&gt; for all
areas, in addition to their coordinates.&lt;/p&gt;
&lt;p&gt;Creating the grid was slightly worse, since &lt;code&gt;make-array&lt;/code&gt; didn&apos;t want to take my
dynamic sizes as dimension arguments.&lt;/p&gt;
&lt;p&gt;Quick sidenote, after entering a function name wrong, Slime prompted me to
enter another expression as the function. I tried this, and Emacs froze.
Having spent 6 years in Vim, I can not recall any specific time it has crashed
(I know it has, but I can&apos;t remember it). After restarting I had to install
&lt;code&gt;Slime&lt;/code&gt; again (I am yet to find out how to properly install stuff with Emacs),
and after installing it again, I run into problems with &lt;code&gt;No Lisp subprocess; see variable &apos;inferior-lisp-buffer&apos;&lt;/code&gt;, despite Slime and Swank and whatever
running just fine.  Restarting Emacs, yet again, and installing Slime again,
seems to fix it.&lt;/p&gt;
&lt;p&gt;After not figuring out how to make arrays without a fixed size, since I
couldn&apos;t make a 2d array of a dynamic size, I realized I could make it work
using &lt;code&gt;make-array&lt;/code&gt; and &lt;code&gt;loop&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun make-grid (points)
  (let* ((max-x (+ 1 (reduce #&apos;max (mapcar #&apos;point-x points))))
         (max-y (+ 1 (reduce #&apos;max (mapcar #&apos;point-y points))))
         (grid (make-array max-y)))
    (loop for y from 0 below max-y do
      (setf (aref grid y) (make-array max-x)))
    grid))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We&apos;re adding &lt;code&gt;1&lt;/code&gt; to &lt;code&gt;max-{x,y}&lt;/code&gt; so that we can index with all coordinates in
the input list.&lt;/p&gt;
&lt;p&gt;Calculating the closest point for each cell in the grid is done with nested
&lt;code&gt;loop&lt;/code&gt;s. The logic for finding the best got somewhat messy, but it should work.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun manhattan (a b)
  (+ (abs (- (point-x a) (point-x b)))
     (abs (- (point-y a) (point-y b)))))

(defun mark-closest (grid points)
  (let ((mx (length (aref grid 0)))
        (my (length grid)))
    (loop for y from 0 below my do
      (loop for x from 0 below mx
        for c = (make-point :x x :y y)
            do (let* ((dists (loop for p in points collect (list (manhattan c p) p)))
                      (sorted (sort dists #&apos;&amp;lt; :key #&apos;car ))
                      (best (car sorted))
                      (tie (eq (first (first sorted)) (first (second sorted)))))
                 (setf (aref (aref grid y) x)
                       (if tie 0 (point-id (second best)))))))
    grid))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In order to get the areas touching the border, we just loop through the four borders,
collect the numbers we see, and dedup at the end.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun get-border-areas (grid)
  (let ((mx (length (aref grid 0)))
        (my (length grid)))
    (remove-duplicates (append
                        (loop for y from 0 below my collect (aref (aref grid y) 0))
                        (loop for y from 0 below my collect (aref (aref grid y) (- mx 1)))
                        (loop for x from 0 below mx collect (aref (aref grid 0) x))
                        (loop for x from 0 below mx collect (aref (aref grid (- my 1)) x))))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For counting the sizes of the areas we could have used a hashmap, but we might
as well use the fact that all areas are numbered between &lt;code&gt;0&lt;/code&gt; and the number of
areas. Then we can make an array of counts, loop over the grid, and count up.
When done we return &lt;code&gt;(id, count)&lt;/code&gt; pairs for all areas that were non-null.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun count-area-sizes (grid num-areas)
  (let ((arr (make-array num-areas))
        (mx (length (aref grid 0)))
        (my (length grid)))
    (loop for y from 0 below my do
      (loop for x from 0 below mx
        for area = (aref (aref grid y) x)
        do (incf (aref arr area))))
    (loop for i from 0 below num-areas
          when (&amp;lt; 0 (aref arr i)) collect (list i (aref arr i)))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Removing the infinite area areas from the list of &lt;code&gt;(id, area)&lt;/code&gt; tuples didn&apos;t
have to be its own function, but we&apos;ve come so far with the top-down mindset,
so let&apos;s overuse it a little.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun exclude-infinities (area-sizes infinities)
  (remove-if #&apos;(lambda (l) (find (car l) infinities)) area-sizes))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is the last function we needed to implement &lt;code&gt;day-6/1&lt;/code&gt;. Having all the
helper functions, we just need to make some small adjustments to the function,
and we&apos;re good to go.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-6/1 (input)
  (setf *point-count* 0)
  (let* ((points (parse-points input))
         (num-areas (1+ (length points)))
         (grid (make-grid points)))
    (mark-closest grid points)
    (let* ((infinites (get-border-areas grid))
           (area-sizes (count-area-sizes grid num-areas))
           (valids (exclude-infinities area-sizes infinites))
           (max-area (car (sort valids #&apos;&amp;gt; :key #&apos;second))))
      (second max-area))))
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Part 2&lt;/h3&gt;
&lt;p&gt;The second part in comparison require very little code. We simply do the same thing:
find the size of the grid, loop over the grid, measure the sum of the distances to
all points, and count the number of cells with a suficciently low distance.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-6/2 (input)
  (setf *point-count* 0)
  (let* ((points (parse-points input))
         (max-x (+ 1 (reduce #&apos;max (mapcar #&apos;point-x points))))
         (max-y (+ 1 (reduce #&apos;max (mapcar #&apos;point-y points))))
         (count 0))
    (loop for y from 0 below max-y do
      (loop for x from 0 below max-x
            for point = (pt x y)
            when (&amp;lt; (reduce #&apos;+ (mapcar #&apos;(lambda (p) (manhattan p point)) points)) 10000)
              do (incf count)))
    count))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and that&apos;s it!&lt;/p&gt;
&lt;h2&gt;Day 7&lt;/h2&gt;
&lt;h3&gt;Part 1&lt;/h3&gt;
&lt;p&gt;We start out by parsin each input line to a pair, so that we can easier handle the dependency edges.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun line-to-pair (line)
  (let ((a (subseq line 5 6))
        (b (subseq line 36 37)))
    (list a b)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One approach we can take is to try to continuously find all nodes that does not depend on any other node,
and select the first alphabetically. The most straight forward way of doing this is to
look through the list of edges, and count the number any node is the second element of the list.
Then we look throught the counts and chose the first node with a count of 0.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun get-next (nodes edges)
  (defun zero-keys (hm)
    (loop for k being the hash-keys of hm
          when (eq 0 (gethash k hm)) collect k))
  (let ((hm (make-hash-table :test #&apos;equalp)))
    (loop for node in nodes do (setf (gethash node hm) 0))
    (let ((available (loop for e in edges
                           do (print (second e))
                           do (incf (gethash (second e) hm))
                           finally (return (zero-keys hm)))))
    (reduce #&apos;(lambda (a e) (if (string&amp;lt; a e) a e)) available))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The outer loop is mostly keeping track of the nodes and edges we have left,
and removing the elements that we no longer use after outputting a node.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-7/1 (input)
  (let* ((output)
         (edges (mapcar #&apos;line-to-pair input))
         (nodes (remove-duplicates (flatten edges) :test #&apos;string=)))
    (loop when (not (car nodes)) return output
            do (let ((next (get-next nodes edges)))
                 (setf edges (delete-if #&apos;(lambda (edge) (string= (first edge) next)) edges))
                 (setf nodes (delete next nodes))
                 (setf output (cons next output))))
    (reduce #&apos;(lambda (a b) (concatenate &apos;string a b)) (reverse output))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We also used a &lt;code&gt;flatten&lt;/code&gt; function stolen from [rosettacode].&lt;/p&gt;
&lt;h3&gt;Part 2&lt;/h3&gt;
&lt;p&gt;Todays second part seems very different from the first. We are asked to
schedule variable length tasks with 5 workers.&lt;/p&gt;
&lt;p&gt;First off, it does not matter which worker does which task. Second off, we
probably want to prioritize starting with longer tasks, if possible.
We still have task dependencies, which we need to remember.&lt;/p&gt;
&lt;p&gt;One approach to solving this is to have a queue of all tasks that is currently
processed. Then at each step we would find the next task, assign a worker to
it, and find the time for when the task is done. If there are multiple nodes
without any dependencies we would chose as many as we have workers.  In
addition, we probably want to chose the longest tasks first; that is, the
&lt;em&gt;largest&lt;/em&gt; lexiograhpically, as opposed to the smallest, as in part 1.&lt;/p&gt;
&lt;p&gt;This is the &lt;code&gt;task&lt;/code&gt; data that we work with: &lt;code&gt;id&lt;/code&gt; is the task name, &lt;code&gt;done&lt;/code&gt; is the
time at which the task is done, and &lt;code&gt;worker&lt;/code&gt; is a worker id.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defstruct task id done worker)
(defun task-cost (id)
  (+ 60 (- (char-int (char id 0)) 64)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;get-next/2&lt;/code&gt; is just like &lt;code&gt;get-next&lt;/code&gt;, except that we chose the largest instead
of the smallest alphabetically, since this has the largest cost.
Now our main function looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-7/2 (input)
  (let* ((edges (mapcar #&apos;line-to-pair input))
         (nodes (remove-duplicates (flatten edges) :test #&apos;string=))
         (available-workers (loop for i from 1 to 5 collect i))
         (in-flight-tasks))
    (loop for time from 0
          when available-workers do
            (let ((next (get-next/2 nodes edges)))
              (when next
                (setf nodes (delete next nodes))
                (push (make-task :id next
                                 :done (+ time 1 (task-cost next))
                                 :worker (pop available-workers))
                      in-flight-tasks)
                (setf in-flight-tasks (sort in-flight-tasks #&apos;&amp;lt; :key #&apos;task-done))))
          when in-flight-tasks do
              (loop while in-flight-tasks
                when (&amp;lt; time (task-done (car in-flight-tasks))) return nil
                do (let ((task (pop in-flight-tasks)))
                     (setf edges (delete-if #&apos;(lambda (edge) (string= (first edge) (task-id task))) edges))
                     (push (task-worker task) available-workers)))
          when (not nodes) return time)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using this we pass the test input, but on the real input our output is wrong.
A little &lt;code&gt;format&lt;/code&gt; debugging shows us two things: 1. we should not add &lt;code&gt;1&lt;/code&gt; to
the task cost when constructing new tasks, and 2. we need to remove tasks that
are done before trying to add new tasks this round. Without this a task that
takes only one cycle would spend two: the one in which it gets dispatched, and
the one in which it completes. In addition, we&apos;re not dispatching multiple
tasks at a time, which we should. The end condition was also wrong, as it terminated
as soon as the last task was dispatched, but not completed. Somehow all these errors
canceled out when ran on the test input.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-7/2 (input num-workers)
  (let* ((edges (mapcar #&apos;line-to-pair input))
         (nodes (remove-duplicates (flatten edges) :test #&apos;string=))
         (available-workers (loop for i from 1 to num-workers collect i))
         (in-flight-tasks))
    (loop for time from 0
          when in-flight-tasks do
            (loop while in-flight-tasks
                  when (&amp;lt; time (task-done (car in-flight-tasks))) return nil
                    do (let ((task (pop in-flight-tasks)))
                         (setf edges (delete-if #&apos;(lambda (edge) (string= (first edge) (task-id task))) edges))
                         (push (task-worker task) available-workers)))
          when available-workers do
            (loop for next = (get-next nodes edges)
                  when (not available-workers) return nil
                  if next do (progn
                               (setf nodes (delete next nodes))
                               (push (make-task :id next
                                                :done (+ time (task-cost next))
                                                :worker (pop available-workers))
                                     in-flight-tasks)
                               (setf in-flight-tasks (sort in-flight-tasks #&apos;&amp;lt; :key #&apos;task-done)))
                  else return nil)
          when (and (not nodes) (not in-flight-tasks)) return time)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After about 90 minutes of debugging, &lt;code&gt;format&lt;/code&gt;ing, and asking around how people
did resolve ties, I ended up with this, which gives me the correct answer.&lt;/p&gt;
&lt;p&gt;Regarding ties, I was confused since the task did not explicitly say that ties
should still be resolved alphabetically, and I suspect it does matter (although
I haven&apos;t come up with an example).  In order to see whether this actually was
the error in my code, I resolved ties randomly, and ran the function on the
input 100 times; they all gave me the same answer.&lt;/p&gt;
&lt;p&gt;I&apos;m still not sure what was the bug, since I ended up not doing anything
meaningful edits in the last hour of debugging. There might have been stale
function implementations or something as well, or just that I forgot to turn
back the cost function or the number of workers, between testing the function
on the test input vs. the real input.&lt;/p&gt;
&lt;p&gt;In any case, day 7 is complete.&lt;/p&gt;
&lt;h2&gt;Day 8&lt;/h2&gt;
&lt;p&gt;Today we have good news and bad news.  The good news is that todays data
structure is the tree!  The bad news is that the input is a single line of
space separated digits, so we&apos;ll have to make &lt;code&gt;split-sequence&lt;/code&gt; work.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;split-sequence&lt;/code&gt; is not in the standard library, so we cannot just use it.
It is apparently a part of the Common Lisp Utilities, although that tells me nothing;
browsing through the homepage of the utilities doesn&apos;t give me much information about
how to actually use this.
The examples given for &lt;code&gt;split-sequence&lt;/code&gt; seem to have already loaded a package called
&lt;code&gt;split-sequence&lt;/code&gt;. I guess that it is installable using &lt;code&gt;quicklisp&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (ql:quickload &amp;quot;split-sequence&amp;quot;)
To load &amp;quot;split-sequence&amp;quot;:
  Load 1 ASDF system:
    asdf
  Install 1 Quicklisp release:
    split-sequence
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;(rant warning)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;... but nothing happens after this is ran, and I need to abort it with &lt;code&gt;C-C C-C&lt;/code&gt;.  Trying &lt;code&gt;&amp;quot;cl-utilities&amp;quot;&lt;/code&gt; and &lt;code&gt;&amp;quot;utilities&amp;quot;&lt;/code&gt; instead did not help out: I
got &lt;code&gt;ETIMEDOUT&lt;/code&gt; from the former in the debugger, and the latter did apparently
not do anything. I figured that the &lt;code&gt;ETIMEDOUT&lt;/code&gt; might be due to me having
outdated stuff in my quicklisp installation, so I ran &lt;code&gt;(ql:update-dist &amp;quot;quicklisp&amp;quot;)&lt;/code&gt;, which I found at the quicklisp website. After (I&apos;m guessing) 30
seconds without any feedback to whether something actually happend when typing
that in the repl, I get yet another &lt;code&gt;ETIMEDOUT&lt;/code&gt;. Maybe the quicklisp client is
outdated?  (this seems very unlikely, since I installed it about eight days
ago, but at this point I have no idea what&apos;s going on) Running &lt;code&gt;(ql:update-client)&lt;/code&gt;
gets me nowhere: yet another &lt;code&gt;ETIMEDOUT&lt;/code&gt;. I suppose the quicklisp site could be down?
Folloing the install instructions I followed about a week ago I run&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -O https://beta.quicklisp.org/quicklisp.lisp.asc
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and... nothing happens! Great! .. or is it? Investigating further I tried to
check &lt;code&gt;isitdownrightnow.com&lt;/code&gt; to confirm that the site was indeed down, but I
failed to connect to &lt;code&gt;isitdownrightnow.com&lt;/code&gt;! It seemes unlikely that both of
these sites are down, so I check &lt;code&gt;downforeveryoneorjustme.com&lt;/code&gt;, which claims
that both &lt;code&gt;beta.quicklisp.org  and &lt;/code&gt;isitdownrightnow.com` are in fact down for
just me. QuickLisp just times out, but isitdownrightnow gives me a cloudflare
page, so presumably the problem is not in my house, which means that there&apos;s
probably not much I can do.&lt;/p&gt;
&lt;p&gt;(Update the 10th: the network is more or less back to normal, and installing
&lt;code&gt;sequence-split&lt;/code&gt; was as simple as &lt;code&gt;(ql:quickload &amp;quot;split-sequence&amp;quot;)&lt;/code&gt;; so much
for getting annoyed :)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(rant warning end)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Luckily, &lt;code&gt;cl-ppcre&lt;/code&gt; offers the samf functionality with &lt;code&gt;(ppcre:split delim string)&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun line-to-numbers (line)
  (mapcar #&apos;parse-integer (ppcre:split &amp;quot; &amp;quot; line)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then we make a function to parse the line into a tree. We first find the number
of children an number of metadata entries, then for each child we recursively
call the parse function on the list without the two numbers we&apos;ve already read.
This is slightly awkward since we have to both collect the child nodes, as well
as keep track of where in the input list we are. For this we use
&lt;code&gt;multiple-value-bind&lt;/code&gt;, and have the function return a pair &lt;code&gt;node, remaining-input&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun parse-tree (items)
  (let* ((num-children (first items))
         (num-metadata (second items))
         (rest (cddr items))
         (children
           (loop for i from 0 below num-children
                 collect
                 (multiple-value-bind (node new-rest) (parse-tree rest)
                   (setf rest new-rest)
                   node)))
         (metadata (subseq rest 0 num-metadata)))
    (values (list children metadata) (subseq rest num-metadata))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Running this on the test input gives us this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(((NIL (10 11 12)) (((NIL (99))) (2))) (1 1 2))
; formatted, and with labels:
node: (
  children: (
    node: (children: NIL metadata: (10 11 12))
    node: (
      children: (
        node: (children: NIL metadata: (99))
      metadata: (2)))
  metadata: (1 1 2))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So the tree looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;      (1 1 2)
       /  \
      /    \
     /      \
(10 11 12)  (2)
             |
             |
            (99)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which looks right, compared to the description on the task page.&lt;/p&gt;
&lt;p&gt;Now we just need to sum up all metadata entries. We will again go for a
recursive solution:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun metadata-sum (tree)
  (+ (reduce #&apos;+ (second tree))
     (reduce #&apos;+ (mapcar #&apos;metadata-sum (first tree)))))

* (metadata-sum (parse-tree (line-to-numbers *test-input-8*)))
138
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Yay! This completes the first part.&lt;/p&gt;
&lt;h3&gt;Part 2&lt;/h3&gt;
&lt;p&gt;Now we&apos;re asked to sum up the values of the children if the childs index is in
the nodes metadata, with a note that if a number is multiple times in the list,
it should be counted multiple times. This makes it possible to make inputs so that
the running time becomes exponential, but we&apos;ll try to do the naive thing still.&lt;/p&gt;
&lt;p&gt;The function is almost a straight mapping from the description of the value scoring.
If the node has children, the metadata is the indices (ops: 1 indexed) for the children
we count. If not, the sum of the metadata is the value.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun node-value (node)
  (if (first node)
    (loop for data in (second node)
          summing (node-value (nth (- data 1) (first node))) into sum
          finally (return sum))
    (reduce #&apos;+ (second node))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Tada! The running time is also pretty good!&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (time (day-8/2 *input-8*))
Evaluation took:
  0.113 seconds of real time
  0.113493 seconds of total run time (0.113443 user, 0.000050 system)
  [ Run times consist of 0.025 seconds GC time, and 0.089 seconds non-GC time. ]
  100.00% CPU
  328,615,301 processor cycles
  302,886,128 bytes consed
  
30063
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Day 9&lt;/h2&gt;
&lt;p&gt;Today I want to try out something a little different. The marbles in the task
are in a circle, so I want to try out having a circular list.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun make-circular (e)
  (let* ((l (list e)))
    (setf (cdr l) l)
    l))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Trying to print this out results in looping, so this seems to work.  One
downside of this is that we should be able to move in both directions around
the circle; if we want to go in the other direction we would have to first find
the length of the list, and then go &lt;code&gt;n-k&lt;/code&gt; steps in the opposite direction.
This might take some time, but we can try to do it this way first, in case it
works out.&lt;/p&gt;
&lt;p&gt;Next up is doing stuff with the list. Naturally we cannot just &lt;code&gt;mapcar&lt;/code&gt; over
our circular list (we cannot even &lt;code&gt;subseq&lt;/code&gt; it - I exhausted my heap attempting
to do so), so we need to write our own map:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun map-circular (f circ)
  (let* ((head (first circ))
        (result (list (funcall f head))))
    (loop for e in (cdr circ)
          when (eq e head) return (reverse result)
          do (push (funcall f e) result))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now &lt;code&gt;(map-circular #&apos;print (make-circular 1))&lt;/code&gt; prints &lt;code&gt;1&lt;/code&gt; and gives me back
&lt;code&gt;(1)&lt;/code&gt;. Next we want to insert things, so we&apos;ll write a function for that.
We return &lt;code&gt;t&lt;/code&gt; here so that the REPL doesn&apos;t try to print out the entire list
every time we add something&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun insert-circular (e circ)
  (setf (cdr circ) (cons e (cdr circ)))
  t)

* (defparameter nums (make-circular 1))
NUMS
* (insert-circular 2 nums)
T
* (insert-circular 3 nums)
T
* (insert-circular 4 nums)
T
* (map-circular #&apos;print nums)
1 
4 
3 
2 
(1 4 3 2)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Good.&lt;/p&gt;
&lt;p&gt;Now implementing the game is not too difficult.  We keep track of the current
node, the player, and the player scores.  For each round of the game we
increment the current player, and check if the marble is special or not. If it
is, we count the list, and go forward &lt;code&gt;n-8&lt;/code&gt; steps (backwards &lt;code&gt;8&lt;/code&gt;) steps, so we
end up with the node &lt;em&gt;before&lt;/em&gt; the one we want to remove. Then we remove in and
increment the score for the current player.  If the marble is not special we
just insert it after the next marble in the circle.  Lastly, we get the maximum
of the scores.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun play-game (num-marbles players)
  (let* ((circle (make-circular 0))
         (current circle)
         (player 0)
         (scores (make-array players)))
    (insert-circular 1 circle)
    (setf current (cdr circle))
    (loop for marble from 2 to num-marbles do
      (progn
        (setf player (mod (1+ player) players))
        (if (eq (mod marble 23) 0)
            (let* ((len (length-circular circle))
                   (to-remove (nthcdr (- len 8) current)))
              (incf (aref scores player) (+ marble (second to-remove)))
              (remove-circular to-remove)
              (setf current (cdr to-remove)))
            (progn
              (insert-circular marble (cdr current))
              (setf current (cddr current))))))
    (loop for s across scores maximizing s into m finally (return m))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a very wasteful implementation, since we need to go through almost the entire list &lt;em&gt;twice&lt;/em&gt; when removing marbles, since we have to go backwards.
Still, for the input we were given, it doesn&apos;t perform &lt;em&gt;too&lt;/em&gt; bad:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (time (play-game 70848 425))
Evaluation took:
  1.323 seconds of real time
  1.322374 seconds of total run time (1.252015 user, 0.070359 system)
  [ Run times consist of 0.207 seconds GC time, and 1.116 seconds non-GC time. ]
  99.92% CPU
  3,842,548,402 processor cycles
  3,189,760,080 bytes consed
  
413188
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Part 2&lt;/h3&gt;
&lt;p&gt;This, of course, was foreseen by the creator of the task: part 2 asks for the
same game, but for 100 times the number of marbles. Since the current
implementation is roughly quadratic, this means that we&apos;ll use 10.000X the
time: 10.000 seconds is roughly three hours, which means back to the drawing
board.&lt;/p&gt;
&lt;p&gt;The natural approach is to add support for both-way traversal of the list.
We can do this by not using lists, but make our own list:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defstruct node b e f)

(defun make-circular (e)
  (let ((n (make-node :f nil :e e :b nil)))
    (setf (node-f n) n)
    (setf (node-b n) n)
    n))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Inserts and removals are similar to as before, except that we must
swing two pointers instead of one.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun insert-circular (e node)
  (let ((n (make-node :b node :e e :f (node-f node))))
    (setf (node-b (node-f node)) n)
    (setf (node-f node) n)
    t))


(defun remove-circular (node)
  (setf (node-f node) (node-f (node-f node)))
  (setf (node-b (node-f node)) node)
  t)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Having this it is very easy to go &lt;code&gt;n&lt;/code&gt; steps backwards:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun n-back (n node)
  (if (eq n 0) node
      (n-back (- n 1) (node-b node))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The main function is almost not changed at all, with the exception
of swapping &lt;code&gt;car&lt;/code&gt;s with &lt;code&gt;node-e&lt;/code&gt; and &lt;code&gt;cdr&lt;/code&gt;s with &lt;code&gt;node-f&lt;/code&gt;, in addition
to, of course, using &lt;code&gt;n-back&lt;/code&gt; instead of &lt;code&gt;nthcdr&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun play-game (num-marbles players)
  (let* ((circle (make-circular 0))
         (current circle)
         (player 0)
         (scores (make-array players)))
    (insert-circular 1 circle)
    (setf current (node-f circle))
    (loop for marble from 2 to num-marbles do
      (progn
        (setf player (mod (1+ player) players))
        (if (eq (mod marble 23) 0)
            (let* ((to-remove (n-back 8 current)))
              (incf (aref scores player) (+ marble (node-e (node-f to-remove))))
              (remove-circular to-remove)
              (setf current (node-f to-remove)))
            (progn
              (insert-circular marble (node-f current))
              (setf current (node-f (node-f current)))))))
    (loop for s across scores maximizing s into m finally (return m))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here&apos;s the running times for both input:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (time (day-9/1))
Evaluation took:
  0.008 seconds of real time
  0.007581 seconds of total run time (0.007487 user, 0.000094 system)
  100.00% CPU
  22,124,513 processor cycles
  2,162,688 bytes consed
  
413188
* (time (day-9/2))
Evaluation took:
  1.169 seconds of real time
  1.167412 seconds of total run time (1.080873 user, 0.086539 system)
  [ Run times consist of 0.748 seconds GC time, and 0.420 seconds non-GC time. ]
  99.83% CPU
  3,394,813,588 processor cycles
  216,923,648 bytes consed
  
3377272893
&lt;/code&gt;&lt;/pre&gt;
</content></entry><entry><title>Navigate Gates</title><id>https://mht.wtf/post/navigate/</id><updated>2025-07-27T19:52:00+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/navigate/" rel=""/><link href="https://mht.wtf/post/navigate/index.html" rel="alternate"/><published>2025-07-27T19:52:00+02:00</published><content type="text/html">&lt;style&gt;
figure {
    display: flex;
    &gt; div {
        width: 100%;
        display: flex;
        flex-direction: column;
        align-items: center;
    }
    margin-bottom: 1rem;
}
&lt;/style&gt;
&lt;p&gt;Here&apos;s an interesting leetcode-style problem:
we are given two points $p_0$ and $p_1$ and a list of $n$ line segments $G=\{g_i\}$ which we call &lt;em&gt;gates&lt;/em&gt;.
We want to find the shortest path from $p_0$ to $p_1$ that crosses every gate $g_i$ in order.
Imagine a boat sailing from port to port through gates to avoid running aground in shallow waters.&lt;/p&gt;
&lt;h3&gt;A Simple Solution&lt;/h3&gt;
&lt;p&gt;A simple solution is to create a graph where the vertices are the two points and the endpoints of the lines (which we&apos;ll call $l_i$ and $r_i$).
Then we connect up vertices that follow the rules:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;$\{v_0, l_i\}\in E $ if the line $(v_0, l_i)$ intersects all gates $g_k$ for $k=1\dots i-1$ (and same with $r$).&lt;/li&gt;
&lt;li&gt;$\{l_i ,r_j\} \in E, i&amp;lt;j$ if the line $(l_i, r_j)$ intersects all gates $g_k$ for $k=i+1\dots j-1$ (for all four pairs of left/right).&lt;/li&gt;
&lt;li&gt;$\{l_i, v_1\}\in E$ — same rules as with $v_0$ but the other way around.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now we have the full graph of valid moves, and can run Dijkstra&apos;s (or A*, or whatever you want) to find the shortest path.
You can also consider the two endpoints to be gates of zero width so that all three cases collapse to the middle case.&lt;/p&gt;
&lt;figure&gt;
  &lt;div&gt;
    &lt;img src=&quot;./ex.svg&quot;&gt;
    &lt;figcaption&gt;Sample input&lt;/figcaption&gt;
  &lt;/div&gt;
  &lt;div&gt;
    &lt;img src=&quot;./graph.svg&quot;&gt;
    &lt;figcaption&gt;Graph created from the rules above&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;This works, but it&apos;s a lot of work to compute the graph, and without some smart
shortcuts (e.g. checking that the trivial line $(v_0, v_1)$ is valid) you risk
doing a lot of intersection testing where none was required, for instance if
the gates are a bunch of horizontal short segments stacked upwards.
It&apos;s probably possible to prune the set of gates, or accelerate with an spacial
index, or other methods, but this introduces more complexity.&lt;/p&gt;
&lt;p&gt;Can we do better?&lt;/p&gt;
&lt;h3&gt;A Nice Solution&lt;/h3&gt;
&lt;p&gt;Here&apos;s an observation:
If we look at a shortest path through a gate, it is always one of two cases:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The path goes in a straight line through the gate, or&lt;/li&gt;
&lt;li&gt;The path goes to either endpoint and then turns away from the gate.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We can use this to extend shortest paths going from $p_0$ to the gate $g_i$ to also go to gate $g_{i+1}$:
straight lines continue, and the rest of $g_{i+1}$ is covered by going from an endpoint of $g$ (whichever is the closer one).
If we segment each gate this way we can compute the segmentation of the next gate, and once all gates are handled we can backtrack to find the path.&lt;/p&gt;
&lt;p&gt;Let&apos;s see how this works.
The first step is easy because the shortest path from the any point on the first gate back to start is the straight line.
$g_1$ is covered by one solid region.&lt;/p&gt;
&lt;figure&gt;
  &lt;div&gt;
    &lt;img src=&quot;./fig1.svg&quot;&gt;
    &lt;figcaption&gt;Segmenting $g_1$ is trivial&lt;/figcaption&gt;
  &lt;/div&gt;
  &lt;div&gt;
    &lt;img src=&quot;./fig2.svg&quot;&gt;
    &lt;figcaption&gt;A new region is needed to cover $g_2$&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;When segmenting $g_2$ we get two regions.
The region that covered $g_1$ is extended where it overlaps with $g_2$:
all points on this part of $g_2$ has a trivial path back to the start (the straight line).
This doesn&apos;t cover the entire gate since there&apos;s more on the right side, so
we create a new region here that is rooted in the rightmost point of $g_1$, namely $r_1$.
The shortest path from any point on this part of $g_2$ back to the start
is first to $r_1$ (the root of the region), and then back to start.&lt;/p&gt;
&lt;p&gt;This was the observation:
going from $p_0$ to $g_2$ we can either go straight ahead (going through $g_1$ in the process) to $g_2$,
or we can first go to $r_1$, turn to the right, and then go straight ahead to the remainder of $g_2$.
This holds for all points in the region: the shortest path from $p_0$ to &lt;em&gt;any&lt;/em&gt; point $q$ in the region is
the shortest path to the root of that region, plus the straight line to $q$.&lt;/p&gt;
&lt;p&gt;We continue with the remaining gates, and it looks like this:&lt;/p&gt;
&lt;figure&gt;
  &lt;div&gt;
    &lt;img src=&quot;./fig3.svg&quot;&gt;
    &lt;figcaption&gt;$g_3$ gets a new region&lt;/figcaption&gt;
  &lt;/div&gt;
  &lt;div&gt;
    &lt;img src=&quot;./fig4.svg&quot;&gt;
    &lt;figcaption&gt;$g_4$ also gets a new region&lt;/figcaption&gt;
  &lt;/div&gt;
  &lt;div&gt;
    &lt;img src=&quot;./fig5.svg&quot;&gt;
    &lt;figcaption&gt;$g_5$ was already covered&lt;/figcaption&gt;
  &lt;/div&gt;
  &lt;div&gt;
    &lt;img src=&quot;./fig6.svg&quot;&gt;
    &lt;figcaption&gt;A new region for $p_1$&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;In the very last step we need to check which region $p_1$ is in.
If it&apos;s inside a region we&apos;re done, and otherwise we insert a new region on the appropriate endpoint of the gate.
This is also the same as pretending it is a gate of width zero, so no special logic is needed.&lt;/p&gt;
&lt;p&gt;Now that we have found the end we need to backtrack to find the path.
To do this we follow the roots of the regions backwards:
$p_1$ is contained in the tiny blue region rooted in $r_5$, so this is the first point.
$r_5$ is covered by the orange region rooted in $l_2$.
And $l_2$ is covered by the blue region rooted in $p_0$.
The final path is $[p_0, l_2, r_5, p_1]$.&lt;/p&gt;
&lt;p&gt;That&apos;s it!
This method is really cool, because we&apos;ve just computed the shortest path between two points &lt;strong&gt;without
computing any lengths&lt;/strong&gt;!
Think about this for a second: we have found a path that minimizes the distance in between two points
without every computing a single distance.&lt;/p&gt;
&lt;p&gt;What we &lt;em&gt;have&lt;/em&gt; done, is used the &lt;a href=&quot;https://en.wikipedia.org/wiki/Triangle_inequality&quot;&gt;triangle inequality&lt;/a&gt; of metric spaces,
which says it&apos;s never farther to go in a straight line than to go through an intermediate point.
We used this every time we segmented a new gate, since we argued that if we can go in a straight line, we&apos;ll do so.&lt;/p&gt;
&lt;h2&gt;Details&lt;/h2&gt;
&lt;p&gt;This is a cool method, but what makes it even cooler is how little data you need to store to run it.
Consider this: If you look at $g_i$ and you know the ordered roots of each region, that describes
the segmentation of $g_i$:&lt;/p&gt;
&lt;figure&gt;
  &lt;div&gt;
    &lt;img src=&quot;./repr1.svg&quot; style=&quot;margin: 1rem&quot;&gt;
    &lt;figcaption&gt;$g_i$ with roots in order&lt;/figcaption&gt;
  &lt;/div&gt;
  &lt;div&gt;
    &lt;img src=&quot;./repr2.svg&quot; style=&quot;margin: 1rem&quot;&gt;
    &lt;figcaption&gt;Straight lines connect the dots&lt;/figcaption&gt;
  &lt;/div&gt;
  &lt;div&gt;
    &lt;img src=&quot;./repr3.svg&quot; style=&quot;margin: 1rem&quot;&gt;
    &lt;figcaption&gt;Regions are colored&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;We have three roots $1$, $2$, and $3$, and we draw a line from $1$ to $l_i$, lines in between
adjacent roots, and $3$ to $r_i$.
When we extend the regions from $g_{i-1}$ to $g_i$ we only need to find in which region (if any)
the two endpoints $l_i$ and $r_i$ are.
To do this we can find the first (last) line that has $l_i$ ($r_i$) on its left (right).
This is the only operation we need: is a point on the left or right side of a line?&lt;/p&gt;
&lt;figure&gt;
  &lt;div&gt;
    &lt;img src=&quot;./repr4.svg&quot; style=&quot;margin: 1rem&quot;&gt;
    &lt;figcaption&gt;We check which regions $g_i$ is in&lt;/figcaption&gt;
  &lt;/div&gt;
  &lt;div&gt;
    &lt;img src=&quot;./repr5.svg&quot; style=&quot;margin: 1rem&quot;&gt;
    &lt;figcaption&gt;Roots are updated&lt;/figcaption&gt;
  &lt;/div&gt;
  &lt;div&gt;
    &lt;img src=&quot;./repr6.svg&quot; style=&quot;margin: 1rem&quot;&gt;
    &lt;figcaption&gt;New regions visualized&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;Extending the segmentation to a new gate goes like this:
we look at the lines in order from left to right,
and find that $l_i$ is to the left of our first line.
This means that it is before our first region, so we create a new region for $l_i$ (green).
Then we look at $r_i$ to find the first line for which $r_i$ is on the right,
which turns out to be the third line.
This means the point is in the region in between the second and third line (blue),
and so we shrink this region to match $r_i$.
All regions in between the inserted green region and the clipped blue region (only the orange) are kept as-is.&lt;/p&gt;
&lt;p&gt;There&apos;s just one catch: since we&apos;re doing orientation queries we need to be mindful of the
orientation of the lines. In this example we have lines $(1, l_i), (2,1), (2,3), (3,r_i)$;
how can we know that we should use $(2,1)$ and not $(1,2)$?
We can of course store the pairs explicitly, even if this is twice the number of points that we really need.
But we don&apos;t need to.&lt;/p&gt;
&lt;p&gt;It turns out that the segmentation always have the same V-like shape:&lt;/p&gt;
&lt;figure&gt;
  &lt;div&gt;
    &lt;img src=&quot;./fan1.svg&quot; style=&quot;height: 200px&quot;&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;It&apos;s not so hard to imagine why this is: new regions are always added on the two ends and are set to cover
the remaining part of the gate.
Other regions might be shrunk or discarded completely, so their &amp;quot;orientation&amp;quot; doesn&apos;t ever change.
We can store which root is the &amp;quot;bottom&amp;quot; root, and this gives us the ordering of all of the lines,
since they all point away from the root and towards the edges of the gate.&lt;/p&gt;
&lt;p&gt;To simplify backtracking when computing the final shortest path
we can store back links on the endpoints of the gates.
When we compute which region covers the endpoints of a new gate we can record the root of the region that covered it.
Then the final path is computed by following the back links and reversing the path.&lt;/p&gt;
&lt;p&gt;Lastly, checking whether a point is on the left or right side of a line is easy:
you can take the dot product of the rotated line direction and the line-to-point vector
and check its sign:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;fn is_on_the_left(line: Line, p: Point) -&amp;gt; bool {
    let to_p = p - line.root;
    let rot = [-line.dir.y, line.dir.x].into(); // 90deg CCW
    0 &amp;lt; rot.dot(to_p)
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
</content></entry><entry><title>Comments are gray and it&apos;s weird!</title><id>https://mht.wtf/post/comments/</id><updated>2024-09-30T22:25:00+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/comments/" rel=""/><link href="https://mht.wtf/post/comments/index.html" rel="alternate"/><published>2024-09-30T22:25:00+02:00</published><content type="text/html">&lt;p&gt;Most code editors ship with its own color scheme. Basically all editors also allow you to change out the color scheme, and many people do. If we look at the most popular schemes, one commonality between almost all of them is that the color of code comments has low contrast with the background.&lt;/p&gt;
&lt;p&gt;According to the &lt;a href=&quot;https://webaim.org/resources/contrastchecker/&quot;&gt;WebAIM contrast cheker&lt;/a&gt;,
for &amp;quot;normal text&amp;quot; you need a contrast of &lt;strong&gt;4.5 for AA&lt;/strong&gt; and &lt;strong&gt;7.0 for AAA&lt;/strong&gt;. Let&apos;s
see how common editors and their commonly used color schemes fare.&lt;/p&gt;
&lt;aside class=&quot;span-2&quot;&gt;
    Extracting exact color codes from all kinds of editors is a pain, and subpixel rendering makes it hard to pull out from screenshots,
    so some colors might be slightly off.
&lt;/aside&gt;
&lt;h3&gt;Neovim&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://neovim.io/&quot;&gt;Neovim&lt;/a&gt; 0.10 shipped a new &lt;a href=&quot;https://github.com/neovim/neovim/pull/26334&quot;&gt;default color scheme&lt;/a&gt;.
I&apos;ve pulled the colors from my own &lt;code&gt;neovim&lt;/code&gt; (ran without a config) in &lt;code&gt;iterm2&lt;/code&gt;;  it looks like this:&lt;/p&gt;
&lt;pre style=&quot;background: #14161b; color: #e0e2ea&quot;&gt;&lt;code&gt;Foreground (13.99) &lt;span style=&quot;color: #9b9ea4&quot;&gt;// Background (6.74)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href=&quot;https://lazy.folke.io/configuration&quot;&gt;&lt;code&gt;lazy.nvim&lt;/code&gt;&lt;/a&gt; is a popoular &amp;quot;get started&amp;quot; collection of plugins, and it comes with a color scheme that looks like this:&lt;/p&gt;
&lt;pre style=&quot;background: #222436; color: rgb(200, 211, 245);&quot;&gt;&lt;code&gt;Foreground (10.26) &lt;span style=&quot;color: rgb(99, 109, 166)&quot;&gt;// Background (3.11)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One more: one of the top &lt;em&gt;trending colorschemes&lt;/em&gt; on &lt;a href=&quot;https://dotfyle.com/neovim/colorscheme/trending&quot;&gt;dotfyle&lt;/a&gt; is
&lt;a href=&quot;https://github.com/marko-cerovac/material.nvim&quot;&gt;material.nvim&lt;/a&gt;.
It comes in five variants:&lt;/p&gt;
&lt;p&gt;Oceanic:&lt;/p&gt;
&lt;pre style=&quot;background: #25363B; color: #B0BEC5;&quot;&gt;&lt;code&gt;Foreground (6.60) &lt;span style=&quot;color: #546E7A&quot;&gt;// Background (2.33)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Deep Ocean:&lt;/p&gt;
&lt;pre style=&quot;background: #0F111A; color: #A6ACCD;&quot;&gt;&lt;code&gt;Foreground (8.43) &lt;span style=&quot;color: #464B5D&quot;&gt;// Background (2.17)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Palenight:&lt;/p&gt;
&lt;pre style=&quot;background: #292D3E; color: #A6ACCD;&quot;&gt;&lt;code&gt;Foreground (6.11) &lt;span style=&quot;color: #676E95&quot;&gt;// Background (2.76)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Ligher:&lt;/p&gt;
&lt;pre style=&quot;background: #FAFAFA; color: #546E7A;&quot;&gt;&lt;code&gt;Foreground (5.17) &lt;span style=&quot;color: #AABFC9&quot;&gt;// Background (1.83)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Darker:&lt;/p&gt;
&lt;pre style=&quot;background: #212121; color: #B0BEC5;&quot;&gt;&lt;code&gt;Foreground (8.45) &lt;span style=&quot;color: #515151&quot;&gt;// Background (2.03)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Visual Studio Code&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://code.visualstudio.com/&quot;&gt;VS Code&lt;/a&gt; is an extremely popular editor with a bunch of color schemes.
Here&apos;s some of the built-in ones that have many variations, and two well known ones:&lt;/p&gt;
&lt;p&gt;2017 Dark (default):&lt;/p&gt;
&lt;pre style=&quot;background: #1e1e1e; color: #c8c8c8;&quot;&gt;&lt;code&gt;Foreground (9.96) &lt;span style=&quot;color: #669353&quot;&gt;// Background (4.65)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Community) Material Theme&lt;/p&gt;
&lt;aside class=&quot;span-2 left&quot;&gt;
    There&apos;s both the Material and the Comminuty material theme. Both have the same foreground, background, and comment colors.
&lt;/aside&gt;
&lt;pre style=&quot;background: #253238; color: #eff;&quot;&gt;&lt;code&gt;Foreground (12.80) &lt;span style=&quot;color: #546e7a&quot;&gt;// Background (2.44)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Monokai&lt;/p&gt;
&lt;pre style=&quot;background: #272822; color: #f8f8f3;&quot;&gt;&lt;code&gt;Foreground (13.95) &lt;span style=&quot;color: #88846f&quot;&gt;// Background (3.95)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Solarized Dark:&lt;/p&gt;
&lt;aside class=&quot;span-2&quot;&gt;
    VS Code colored identifiers in this blue, and so I&apos;ve put it as the foreground color.
    It also has a gray color that &lt;span style=&quot;background: #002b36; color: #93a1a1; font-family: Iosevka&quot;&gt;looks like this (5.61)&lt;/span&gt;.
&lt;/aside&gt;
&lt;pre style=&quot;background: #002b36; color: #258bd2;&quot;&gt;&lt;code&gt;Foreground (4.08) &lt;span style=&quot;color: #586e75&quot;&gt;// Background (2.79)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I&apos;ll throw in another scheme that&apos;s not built-in, but that I&apos;ve used, namely &lt;a href=&quot;https://www.nordtheme.com/&quot;&gt;Nord&lt;/a&gt;:&lt;/p&gt;
&lt;pre style=&quot;background: #2e3440; color: #d8dee9;&quot;&gt;&lt;code&gt;Foreground (9.25) &lt;span style=&quot;color: #606e88&quot;&gt;// Background (2.43)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Cursor&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://www.cursor.com/&quot;&gt;Cursor&lt;/a&gt; is the new cool LLM based editor. It&apos;s
front-page conists mostly of light color schemes, but on
&lt;a href=&quot;https://www.cursor.com/features&quot;&gt;/features&lt;/a&gt; there&apos;s images of a dark scheme as
well.  It looks like this:&lt;/p&gt;
&lt;pre style=&quot;background: #181818; color: #d5d4d7;&quot;&gt;&lt;code&gt;Foreground (12.03) &lt;span style=&quot;color: #6a6a6a&quot;&gt;// Background (3.28)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Zed&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://zed.dev/&quot;&gt;Zed&lt;/a&gt; is another new editor with focus on collaboration
features.  Whether comments counts as collaboration remains to be seen:&lt;/p&gt;
&lt;pre style=&quot;background: #282c33; color: #acb2be;&quot;&gt;&lt;code&gt;Foreground (6.58) &lt;span style=&quot;color: #5e636f&quot;&gt;// Background (2.32)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Myself&lt;/h3&gt;
&lt;p&gt;Here&apos;s the color scheme I use on this site, with contrasts in dark/light respectively:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;Foreground (21.00 / 21.00) // Background (17.35 / 4.59)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This isn&apos;t 100% exactly what I have in my terminal; my background is the same
as the (dark mode) background of this site, and my white is slightly weaker:&lt;/p&gt;
&lt;pre style=&quot;background: #080e13; color: #f2f9ff;&quot;&gt;&lt;code&gt;Foreground (18.26) &lt;span style=&quot;color: #ffe8a6&quot;&gt;// Background (16.02)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;aside class=&quot;span-2 left&quot;&gt;
    The only scheme without AA was Solarized dark, but as mentioned, this might be a quirk of the highlighter in VS code. 
&lt;/aside&gt;
&lt;p&gt;Comment contrast range from 6.74 (Neovim default) down to 1.83 (&lt;code&gt;Material.nvim&lt;/code&gt; lighter).
Only two schemes, Neovim default and VS code&apos;s 2017 Dark is AA.
Compared to the color used for &amp;quot;normal&amp;quot; code, basically all contrasts are AA.&lt;/p&gt;
&lt;h2&gt;About Comments&lt;/h2&gt;
&lt;p&gt;Now that we know comments have bad contrast in most color schemes, let&apos;s think
about why we care.&lt;/p&gt;
&lt;p&gt;The main advantage of higher contrast is &lt;strong&gt;readability&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;But readability is subjective, right? Yes, however, we have contrast ratings so
that designers and developers has a fixed bar to pass for human readability of
text.  If you are coloring &amp;quot;normal text&amp;quot; and the contrast is larger than 4.5,
you have passed the bar, and the text will be readable by very many.  In the
case of brevity, let&apos;s just call this &lt;em&gt;readable&lt;/em&gt;. Only &lt;strong&gt;three&lt;/strong&gt; of the schemes
above have a AA contrast for comments (and one of them is mine!).  The bar is
not passed, and so we cannot claim that these comments are readable.&lt;/p&gt;
&lt;p&gt;Comments are often prose, and as such, the most similar to &amp;quot;normal text&amp;quot; as
anything you can find in a source file. This is different than anything else in there,
so the contrast requirement for most of the other tokens does not need to be as high.
Even a token you can barely make out gives you information about what that
token is, because it&apos;s color (and contrast), as well as it&apos;s position relative
to surrounding tokens, is recognizable: you don&apos;t need to check if a line ends
with a &lt;code&gt;{&lt;/code&gt; or a &lt;code&gt;}&lt;/code&gt; when the next line is indented farther than the current line; a
gray blur suffices, because you know it&apos;s a &lt;code&gt;{&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Comments are not like this!  Comments are meant to be &lt;strong&gt;read&lt;/strong&gt;, and your color
scheme should &lt;em&gt;help&lt;/em&gt; you doing that.&lt;/p&gt;
&lt;p&gt;Having weak contrast for comments make them stand out less, and will make them
harder to notice.  In turn, this will increase the chance of not reading them,
them becoming outdated, or never written in the first place.  If you don&apos;t read
existing comments, how likely are you to write new ones?&lt;/p&gt;
&lt;p&gt;A comment is a great place for things in your program that you cannot express
in the code.  Omitting this information is neglecting to share it with your
collaborators, both existing and current.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
</content></entry><entry><title>Playback speed on Substack</title><id>https://mht.wtf/post/substack-video/</id><updated>2023-08-25T16:33:19+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/substack-video/" rel=""/><link href="https://mht.wtf/post/substack-video/index.html" rel="alternate"/><published>2023-08-25T16:33:19+02:00</published><content type="text/html">&lt;p&gt;I&apos;m subscribed to &lt;a href=&quot;https://www.computerenhance.com/&quot;&gt;a Substack&lt;/a&gt; that I enjoy, but the Substack video player doesn&apos;t have
any options for adjusting the playback speed of the video. Often, I prefer 1.33 speed, or
even 1.5 depending on the content. This has been a little annoying, and what I thought
was the alternative, downloading the video (somehow) and playing it in some player which does
support changing the speed was a little too much work.&lt;/p&gt;
&lt;p&gt;However, HTML5 video elements do support changing the playback speed, and while
it&apos;s not easy to do as a user, it&apos;s very straight-forward for a developer.
Here&apos;s how (on Firefox; I assume other browsers are similar):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Select the &lt;code&gt;video&lt;/code&gt; element with the DOM inspector&lt;/li&gt;
&lt;li&gt;Right click on the element and select &lt;code&gt;&amp;quot;Use in Console&amp;quot;&lt;/code&gt;. This opens the
console with &lt;code&gt;temp0&lt;/code&gt; bound to the video player element.&lt;/li&gt;
&lt;li&gt;Execute &lt;code&gt;temp0.playbackRate = 1.33&lt;/code&gt; (or whatever speed you want)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That&apos;s it!&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/API/HTMLMediaElement/playbackRate&quot;&gt;Here&apos;s&lt;/a&gt;
the MDN docs on the &lt;code&gt;playbackRate&lt;/code&gt; property. It says that if you set
&lt;code&gt;playbackRate&lt;/code&gt; to a negative value, the video will play backwards (!). This,
however, doesn&apos;t seem to always be supported.&lt;/p&gt;
&lt;p&gt;Any remarks can be sent to &lt;a href=&quot;mailto:https://lists.sr.ht/~mht/public-inbox&quot;&gt;my public inbox&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
</content></entry><entry><title>Careless Limit</title><id>https://mht.wtf/post/careless-limit/</id><updated>2025-04-16T16:50:00+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/careless-limit/" rel=""/><link href="https://mht.wtf/post/careless-limit/index.html" rel="alternate"/><published>2025-04-16T16:50:00+02:00</published><content type="text/html">&lt;p&gt;I took easter off of work and found time to read two books.
These aren&apos;t exactly reviews, but summaries of some thoughts I had after having read them both.&lt;/p&gt;
&lt;h2&gt;Character Limit&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://www.penguinrandomhouse.com/books/737290/character-limit-by-kate-conger-and-ryan-mac/&quot;&gt;&lt;em&gt;Character Limit&lt;/em&gt;&lt;/a&gt; by Kate Conger and Ryan Mac
tells the story about Twitter and Elon Musks purchase and transformation of the service into X.
It reads like a documentary, which I really enjoyed.
By page 105, Elon has bought 9.2% of Twitter, and by page 260 the $46.5 billions are transferred.
Who owns the service in between those two points is blurry, and Musks &lt;em&gt;&amp;quot;goons&amp;quot;&lt;/em&gt; are
running around calling shots that are not theirs to call.
Rebranding, budget cuts, and layoffs are on the agenda, and in the end everybody seems to have lost.
Maybe apart from Delaware Court of Chancery chancellor, and certified badass, &lt;a href=&quot;https://en.wikipedia.org/wiki/Kathaleen_McCormick&quot;&gt;Kathaleen McCormick&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Twitter employees, as well as the authors, would often refer to Twitter as the &amp;quot;town square&amp;quot;,
and how crucial it is to maintain Twitter as the center of public conversation.
I am unsure if this is a US-ism, a not-my-country-ism, or a not-my-social-circle-ism, but
it&apos;s certainly &lt;em&gt;something&lt;/em&gt;; Twitter/X has, in my life anyways, never been a &amp;quot;real&amp;quot; place where
&amp;quot;real&amp;quot; things happen.  Only memes, shitposting, trolling, and the likes.&lt;/p&gt;
&lt;p&gt;Nevertheless, the book is entertaining. Also, the authors are on the &lt;a href=&quot;https://oxide-and-friends.transistor.fm/&quot;&gt;latest episode of Oxide and Friends&lt;/a&gt;, which is not releases as I&apos;m writing this.&lt;/p&gt;
&lt;h2&gt;Careless People&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://read.macmillan.com/fib/careless-people/&quot;&gt;&lt;em&gt;Careless People&lt;/em&gt;&lt;/a&gt; by Sarah Wynn-Williams is a memoir by the former Facebook global public policy director.
I am not sure what to think about the book.
Being a memoir, it reads very differently than &lt;em&gt;Character Limit&lt;/em&gt; and mixes humorous self-deprecating stories
with detailed accounts of acts by reckless Facebook execs.
I guess I was mainly interested in the latter.&lt;/p&gt;
&lt;p&gt;My main gripe is that Wynn-Williams is obvilious that she is complicit.
In an eary chapter she tells a story about playing Catan with Zucc and company on his private jet.
The others are letting Zucc win, and Wynn-Williams is calling them out on it, quoting herself:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You&apos;re letting him win, Dex and Derick. You&apos;re enabling it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I assumed this to be foreshadowing of herself realizing that as a global public policy director,
whose job includes setting up meeting with policy makers and heads of states,
travelling with Zucc and other key people around the world,
and ensuring Facebooks influence over the &amp;quot;real&amp;quot; world,
&lt;em&gt;she&lt;/em&gt; is also enabling &amp;quot;it&amp;quot;.
This reflection was nowhere to be found in the book.&lt;/p&gt;
&lt;p&gt;While Wynn-Williams distances herself from her colleagues, it seems she was in familiar company.
This is best illustrated by two &amp;quot;quirky&amp;quot; stories:
(1) when giving birth to her first child she insisted on sending work emails in between her contractions
(to her boyfriends protests), and jokes the situation away with how her
doctor told her to &amp;quot;press, don&apos;t press send&amp;quot;; and
(2) that she was tasked to go to South-Korea in order to check if they would jail
Facebook execs (who had arrest orders at the time), and having her boyfriend
remind her that she had a 9-month old baby at home and so being jailed in a
foreign country was a bad idea, to which she agreed.&lt;/p&gt;
&lt;p&gt;Careless People, indeed.&lt;/p&gt;
</content></entry><entry><title>Swapping memory blocks in C</title><id>https://mht.wtf/post/block-swap/</id><updated>2016-02-10T14:01:17+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/block-swap/" rel=""/><link href="https://mht.wtf/post/block-swap/index.html" rel="alternate"/><published>2016-02-10T14:01:17+01:00</published><content type="text/html">&lt;p&gt;Sometimes one have a memory block where we want to put the first &lt;code&gt;n&lt;/code&gt; bytes at the end of the block, rather than the beginning, without changing the blocks themselves, as in the figure below. &lt;code&gt;a&lt;/code&gt;, &lt;code&gt;b&lt;/code&gt;, and &lt;code&gt;e&lt;/code&gt; are pointers to the beginning of &lt;code&gt;A&lt;/code&gt;, the beginning of &lt;code&gt;B&lt;/code&gt;, and the end of &lt;code&gt;B&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;        a |=========|        |=========|
          | block A |  want  | block B |
        b |---------| =====&amp;gt; |         |
          |         |        |         |
          |         |        |         |
          | block B |        |---------|
          |         |        | block A |
        e |=========|        |=========|
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By using extra space, more specifically $O(n)$ space, this is trivial.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// external buffer to hold A
char *tmp = malloc(sizeof(A));
// copy A to the buffer
memmove(tmp, a, sizeof(A));
// move B up to the top
memmove(b, a, sizeof(B));
// insert A at the bottom
memmove(a + sizeof(B)), tmp, sizeof(A));
free(tmp);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Of course, &lt;code&gt;sizeof(A)&lt;/code&gt; will not work when &lt;code&gt;A&lt;/code&gt; is a pointer, but the meaning is still clear.
Additinally, we could use &lt;code&gt;memcpy&lt;/code&gt; instead of &lt;code&gt;memmove&lt;/code&gt; on the first and last call,
if we really cared about not copying the data too much around&lt;sup&gt;&lt;a href=&quot;#user-content-fn-memmove-impl&quot; id=&quot;user-content-fnref-memmove-impl&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;This is a fine solution; it works. But can we do better? Well, can we do it without the &lt;code&gt;malloc&lt;/code&gt; call?&lt;/p&gt;
&lt;h2&gt;A constant memory &lt;code&gt;block_swap&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;This algorithm is based on the simple idea that if we swap block &lt;code&gt;A&lt;/code&gt; to its final position,
we have swapped a block &lt;code&gt;B2&lt;/code&gt;, which is of the same size as &lt;code&gt;A&lt;/code&gt;, to the top.
We are then left with a similar but smaller problem, which is to swap &lt;code&gt;B2&lt;/code&gt; and &lt;code&gt;B1&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  a |=========|            |=========|               |=========|
    | block A |  one pass  | block B2|  new problem  | block B2|
  b |---------| =========&amp;gt; |---------| ============&amp;gt; |---------|
    |         |            |         |               |         |
    | block B1|            | block B1|               | block B1|
  c |- - - - -|            |---------|               |=========|
    | block B2|            | block A |               :(block A):
  e |=========|            |=========|               ...........
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With this we can sketch out the general idea of our alogrithm:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;void one_by_one_swap(char *a, char *b, size_t n) {
    for (size_t i = 0; i &amp;lt; n; i++) {
        char tmp = a[i];
        a[i] = b[i]:
        b[i] = tmp;
    }
}

void block_swap(char *a, char *b, char *e) {
    char *c = e - sizeof(A);
    one_by_one_swap(a, c, sizeof(A));
    block_swap(a, b, c);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Specifics&lt;/h3&gt;
&lt;p&gt;One assumption we have made so far is that the &lt;code&gt;B&lt;/code&gt; block is larger than the &lt;code&gt;A&lt;/code&gt; block.
In order to fix this, we could check which of the blocks is the larger, and swap around the logic,
such that we always swap the smaller &apos;into&apos; the larger.
This allows us for a little optimization: if the blocks are of equal size, we can simply make one call to &lt;code&gt;one_by_one_swap&lt;/code&gt;. In the second and final listing, we have even added some argument checking.
This code is runable.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;void block_swap(char *a, char *b, char *e) {
    assert(a &amp;lt; b);
    assert(b &amp;lt; e);
    size_t a_size = b - a;
    size_t b_size = e - b;
    if (a_size &amp;lt; b_size) {
        // The case we assumed above
        char *c = e - a_size;
        one_by_one_swap(a, c, a_size);
        block_swap(a, b, c);
    } else if (b_size &amp;lt; a_size) {
        // The opposite case
        // Now `c` is between `a` and `b`
        char *c = a + b_size;
        one_by_one_swap(a, b, b_size);
        block_swap(c, b, e);
    } else {
        // The trivial case
        one_by_one_swap(a, b, a_size);
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can still do one more thing, which is replacing the recursion with a loop.
However, the function is tail-recursive, so this is a trivial transformation for the compiler.
Additionally, the transformation isn&apos;t very interesting, so we will keep this recursive definition.&lt;/p&gt;
&lt;h3&gt;Efficiency&lt;/h3&gt;
&lt;p&gt;What is the running time of this? We can figure this out pretty intuetively, by the observation that
each pass swaps a block of &lt;code&gt;a&lt;/code&gt; elements into their &lt;em&gt;correct&lt;/em&gt; position. Then, these elements
are not touched again. This means that all elements are moved exacly once --- no more, no less.
Hence, this is a linear algorithm, as one would expect from a memory moving algorithm.&lt;/p&gt;
&lt;p&gt;What about the space complexity?
Even though the function is recursive, it is tail recursive, as the last thing that happends in the code paths where recursion is used is the recursive call itself.
The compiler transforms this into a simple loop, such that we keep the linear space complexity (which, after all, was our main motivation to do this).&lt;/p&gt;
&lt;p&gt;Lastly, a final optimization that could be used is in the &lt;code&gt;one_by_one_swap&lt;/code&gt;. Instead of swapping one byte at a time, we could swap, say, eight bytes at a time while there are more than eight bytes left to swap, and swap the remaining bytes one by one.&lt;/p&gt;
&lt;p&gt;We have shown a possible implementation of the problem of swapping two ajacent memory block, without using an auxiliary buffer.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-memmove-impl&quot;&gt;
&lt;p&gt;The GNU C library&lt;sup&gt;&lt;a href=&quot;#user-content-fn-glibc&quot; id=&quot;user-content-fnref-glibc&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; implementation of memmove calls &lt;code&gt;memcpy&lt;/code&gt; if the memory blocks are not overlapping, so this would be a &lt;em&gt;minor&lt;/em&gt; optimization. &lt;a href=&quot;#user-content-fnref-memmove-impl&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-glibc&quot;&gt;
&lt;p&gt;https://www.gnu.org/software/libc/download.html &lt;a href=&quot;#user-content-fnref-glibc&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>A Sunday Morning Boot Problem</title><id>https://mht.wtf/post/efistub/</id><updated>2020-06-14T15:55:32+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/efistub/" rel=""/><link href="https://mht.wtf/post/efistub/index.html" rel="alternate"/><published>2020-06-14T15:55:32+02:00</published><content type="text/html">&lt;p&gt;I woke up today to my computer not booting into my arch installation, but into &lt;code&gt;memtest86+&lt;/code&gt;.
A few months ago I also had problems with booting when I flashed a new BIOS&lt;sup&gt;&lt;a href=&quot;#user-content-fn-bios&quot; id=&quot;user-content-fnref-bios&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; that turned out to be a beta version (thanks MSI!) and not working.
At the time I removed GRUB and decided to use EFISTUB instead, since I don&apos;t need anything fancy for my booting;
I only have one disk from which I boot.&lt;/p&gt;
&lt;p&gt;After having changed to EFISTUB I had some problems the first times I upgraded my Linux version;
when a new version is installed you build two important files, &lt;code&gt;vmlinuz-linux&lt;/code&gt; and &lt;code&gt;initramfs-linux.img&lt;/code&gt;,
which, as far as I can tell, are the kernel itself and the initial data you want to be in RAM.
So, when you update linux you&apos;ll get a new &lt;code&gt;vmlinuz-linux&lt;/code&gt; with that new version:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;/h/mht$ file /boot/EFI/arch/vmlinuz-linux
/boot/EFI/arch/vmlinuz-linux: Linux kernel x86 boot executable bzImage, version 5.7.2-arch1-1 (linux@archlinux) #1 SMP PREEMPT Wed, 10 Jun 2020 20:36:24 +0000, RO-rootFS, swap_dev 0x7, Normal VGA
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The problem I got was that the new generated files were put in &lt;code&gt;/boot&lt;/code&gt;, but my EFI partition,
which should either contain (or know the location of) the files above, was mounted to &lt;code&gt;/boot/efi&lt;/code&gt;,
and so when I tried to boot there was a mismatch between the linux image loaded, which was the old version in
&lt;code&gt;/boot/efi&lt;/code&gt; and the new version, which was installed to my system at &lt;code&gt;/&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The solution was to make a systemd service thingy that would run whenever &lt;code&gt;/boot/initramfs-linux-fallback.img&lt;/code&gt; changed
and copy the three files into &lt;code&gt;/boot/efi/EFI/arch&lt;/code&gt;. This worked, and all was well.&lt;/p&gt;
&lt;p&gt;That is, until this morning.&lt;/p&gt;
&lt;p&gt;I don&apos;t know why it stopped working, but it suddenly did, and my system refused to boot.
In the boot options menu from my motherboard I still had the correct boot entry,
but upon selection it would flash black and a message along the lines of&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;The file &apos;\EIF\arch\vmlinuz-linux&apos; could not be found.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;would flash for about a frame.&lt;/p&gt;
&lt;p&gt;I flashed a USB drive with the June arch &lt;code&gt;.iso&lt;/code&gt; on it, and looked around in the UEFI shell,
which I got kind of familiar with from the last time I messed around with these things.
I was able to find the linux image on the EFI partition, and boot it with the right kernel parameters saying
which block device is to be mounted as root (which involves typing in a long &lt;code&gt;PARTUUID&lt;/code&gt;) and where the &lt;code&gt;initramfs&lt;/code&gt; file is.
Luckily it all worked, which kind of rules out hardware failure.&lt;/p&gt;
&lt;p&gt;I went back and forth a bit, trying to edit the boot entries with &lt;code&gt;bcfg&lt;/code&gt; in the UEFI shell or with
&lt;code&gt;efibootmgr&lt;/code&gt; after having &lt;code&gt;arch-chroot&lt;/code&gt;ed into my disk from the live flash drive.
Nothing seemed to work; in fact, nothing even seemed wrong about the boot entry that I had from before.&lt;/p&gt;
&lt;p&gt;I didn&apos;t really know what to try out next, so I tried to trim some of the paths to the files on the EFI partition from
being full to just the filename. This did not work.
Then, looking through the &lt;a href=&quot;https://wiki.archlinux.org/index.php/EFI_system_partition&quot;&gt;EFI system partition&lt;/a&gt; page on the Arch wiki I noticed the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;/efi&lt;/code&gt; is a replacement[6] for the previously popular (and possibly still used by other Linux distributions) ESP mountpoint /boot/efi.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Alright, maybe this is an issue for some reason.
I changed it and updated the systemd scripts, but before I got the chance to test it I read a little bit more on the wiki:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;mount ESP to &lt;code&gt;/boot&lt;/code&gt;. This is the preferred method when directly booting a EFISTUB kernel from UEFI.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Oh okay, I guess I&apos;ll just mount it to &lt;code&gt;/boot&lt;/code&gt; then. This even means I don&apos;t need the systemd scripts anymore
since this is the default place in which to put the files.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; mkdir boot
&amp;gt; cp /boot/* boot
&amp;gt; du -h boot
60M
&amp;gt; rm -r /boot/*
&amp;gt; rm -r /efi
&amp;gt; mount /dev/nvme0n1p1 /boot
&amp;gt; mv -r boot/ /boot/
&amp;gt; kak /etc/fstab # update mount point
&amp;gt; rm -r boot
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Restart, and now it all works.
This time I also tried the two boot entries with relative and absolute paths, and they both worked.
Note that I didn&apos;t have to change the boot entries since they contain the file paths within the ESP partition, and I didn&apos;t change anything on the partition itself, only where in the root partition the ESP partition would be mounted.
This is what&apos;s strange about it all to me.&lt;/p&gt;
&lt;p&gt;The whole ordeal took about 3 hours, from starting the download of the arch ISO to the final working boot.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;The day after writing this I stumbled upon a &lt;a href=&quot;https://www.reddit.com/r/archlinux/comments/h8dk9m/updated_linux_to_572arch11_and_now_my_booting_is/&quot;&gt;post&lt;/a&gt; on &lt;code&gt;/r/archlinux&lt;/code&gt;.
It turns out that the mount point of my EFI partition wasn&apos;t the problem after all, but the path containing forward slashes instead of backslashes were.
Apparently, the parsing code was rewritten, and it just so happens that it accidently worked before.&lt;/p&gt;
&lt;p&gt;Looking back, I can&apos;t really make sense of this, since I thought I&apos;d tried to use the proper slashes after having read that the backslashes are the
way EFI wants it. This is especially strange since I tried once to not have the full path but only the filenames (perhaps I started with a &lt;code&gt;/&lt;/code&gt;?).
Oh well.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-bios&quot;&gt;
&lt;p&gt;I&apos;m not sure if calling it BIOS is technically correct, as I think UEFI, which I now use, is a replacement for BIOS, and that BIOS really is only the booting part and not the terrible GUI in which you change motherboard settings. &lt;a href=&quot;#user-content-fnref-bios&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Confusing Words for a Beginner</title><id>https://mht.wtf/post/words/</id><updated>2020-06-18T17:10:19+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/words/" rel=""/><link href="https://mht.wtf/post/words/index.html" rel="alternate"/><published>2020-06-18T17:10:19+02:00</published><content type="text/html">&lt;p&gt;As people are arguing over what to call the default &lt;code&gt;git&lt;/code&gt; branch,
I started thinking about other words in programming that I now take for granted
but that I also remember being very confusing and/or strange when starting out.
Instead of reading more internet arguments, I figured I&apos;d try to write about some of these words instead.&lt;/p&gt;
&lt;h2&gt;Int&lt;/h2&gt;
&lt;p&gt;Short for integer. Makes sense now, but didn&apos;t make much sense when I first encountered it;
in my native language they&apos;re just called whole numbers.
Luckily, the word isn&apos;t similar to any other word that I knew and could confuse it with.&lt;/p&gt;
&lt;h2&gt;Float and Double&lt;/h2&gt;
&lt;p&gt;This was very confusing when I first encountered it in CheatEngine back in the day.
Float means a decimal? Why does it float? At this point I was picturing a life buoy floating in wavy water.
And Double is just the same? But more exact? Oh it&apos;s bigger? Then shouldn&apos;t Float be called Single?&lt;/p&gt;
&lt;p&gt;(This was even more confusing when I tried to read up on it as in my native language a &amp;quot;decimal number&amp;quot;
refers to a real number that is not an integer, like &lt;code&gt;1.2&lt;/code&gt;, and not a number written in the decimal system,
nor the parts after the decimal separator &lt;code&gt;.2&lt;/code&gt;, which, according to Wikipedia are two interpretations of &amp;quot;decimal number&amp;quot;.)&lt;/p&gt;
&lt;h2&gt;String&lt;/h2&gt;
&lt;p&gt;This was a big one, because in my mind a string was what&apos;s on a guitar.
I might have heard &amp;quot;pearls on a string&amp;quot;, but probably hadn&apos;t heard about &amp;quot;a string of pearls&amp;quot;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-sop&quot; id=&quot;user-content-fnref-sop&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;,
since only the latter really points to anywhere near what we call Strings in programming today.
Why do we even call it a String? Is it because we have characterse neatly in a row? Isn&apos;t things in a row what Arrays are?
This is still rather confusing, and I&apos;ve simply come to accept that String means Text.&lt;/p&gt;
&lt;p&gt;Can new programming languages please stop calling Text for String?&lt;/p&gt;
&lt;h2&gt;Print&lt;/h2&gt;
&lt;p&gt;As in printing something to the screen.
I think this was mainly confusing because in my native language we have borrowed the verb &amp;quot;to print&amp;quot;,
but it is exclusively used for a physical printer.
Hopefully you can appreciate my relief&lt;sup&gt;&lt;a href=&quot;#user-content-fn-p&quot; id=&quot;user-content-fnref-p&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; when after running my first Hello World,
which I think was in Java, Windows didn&apos;t complain that I didn&apos;t have a printer connected.&lt;/p&gt;
&lt;h2&gt;Argument&lt;/h2&gt;
&lt;p&gt;You know, as in a parameter&lt;sup&gt;&lt;a href=&quot;#user-content-fn-parameter&quot; id=&quot;user-content-fnref-parameter&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;!
Not as in reasoning, convincing, or fighting! Why would you think that?&lt;/p&gt;
&lt;h2&gt;Function&lt;/h2&gt;
&lt;p&gt;I&apos;m not really sure that I was ever confused about this, but I still think it&apos;s a bad name
for when you really mean a &lt;em&gt;procedure&lt;/em&gt;.
However, the word function is way easier to write since you don&apos;t have the tricky &lt;code&gt;c e d&lt;/code&gt; sequence.
It&apos;s even easier to pronounce.&lt;/p&gt;
&lt;h2&gt;Void&lt;/h2&gt;
&lt;p&gt;This was actually a good word, since it meant nothing to me.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-sop&quot;&gt;
&lt;p&gt;Even now this sounds like it should mean a string made out of the material that pearls are made of. &lt;a href=&quot;#user-content-fnref-sop&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-p&quot;&gt;
&lt;p&gt;relief/disappointment/surprise &lt;a href=&quot;#user-content-fnref-p&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-parameter&quot;&gt;
&lt;p&gt;I don&apos;t think I&apos;ve ever encountered a situation in which having to differentiate between parameters and arguments are useful. I think a simpler way of thinking about the distinction is with bindings and data. &lt;a href=&quot;#user-content-fnref-parameter&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Actix and FOSS Responsibility</title><id>https://mht.wtf/post/actix/</id><updated>2020-01-18T16:32:36+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/actix/" rel=""/><link href="https://mht.wtf/post/actix/index.html" rel="alternate"/><published>2020-01-18T16:32:36+01:00</published><content type="text/html">&lt;p&gt;The developer of the Rust web framework &lt;code&gt;actix&lt;/code&gt; is &amp;quot;done with open source&amp;quot;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-tweet&quot; id=&quot;user-content-fnref-tweet&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;,
as you might have already seen on Reddit&lt;sup&gt;&lt;a href=&quot;#user-content-fn-reddit&quot; id=&quot;user-content-fnref-reddit&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, HN&lt;sup&gt;&lt;a href=&quot;#user-content-fn-hn&quot; id=&quot;user-content-fnref-hn&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, Lobsters&lt;sup&gt;&lt;a href=&quot;#user-content-fn-lobsters&quot; id=&quot;user-content-fnref-lobsters&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, or somewhere else.
Steve Klabnik&lt;sup&gt;&lt;a href=&quot;#user-content-fn-klabnik&quot; id=&quot;user-content-fnref-klabnik&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; has said a few words, and Raph Linus&lt;sup&gt;&lt;a href=&quot;#user-content-fn-raphlinus&quot; id=&quot;user-content-fnref-raphlinus&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; has some suggestions for how to avoid something like this in the future,
but I don&apos;t feel like either of them (or any other write-up of this I&apos;ve seen) does a proper job of addressing what
I think is the real &amp;quot;issue&amp;quot; at play here, hence this post.&lt;/p&gt;
&lt;p&gt;Edit, 2021.06.22: some months later Drew DeVault wrote &lt;a href=&quot;https://drewdevault.com/2021/06/14/Provided-as-is-without-warranty.html&quot;&gt;a very relevant post&lt;/a&gt;
aobut exactly this; also see other posts on &lt;a href=&quot;https://drewdevault.com/&quot;&gt;his blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here&apos;s a quick rundown of what happened.
Person makes web framework.
It gets some traction in the &amp;quot;community&amp;quot; and is often in benchmarks, where it does quite well.
However, it uses &lt;code&gt;unsafe&lt;/code&gt; liberally, and is not 100% secure from invoking UB.
This is a common complaint of the framework, and people have tried to get patches that addresses this merged in.
The maintainer doesn&apos;t like the patches.
People get angry, and insults are thrown left and right.
Maintainer takes their ball and goes home.&lt;/p&gt;
&lt;p&gt;Let&apos;s do a Q&amp;amp;A-esque thing, in no apparent order.&lt;/p&gt;
&lt;h4&gt;&amp;lt;Maintainer&amp;gt; was an asshole&lt;/h4&gt;
&lt;p&gt;This pops up every now and then, which is strange since I&apos;d assume that it was common knowledge that
others being assholes doesn&apos;t give you the permission to be an asshole yourself.&lt;/p&gt;
&lt;h4&gt;But The Community&lt;sup&gt;&lt;a href=&quot;#user-content-fn-community&quot; id=&quot;user-content-fnref-community&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;7&lt;/a&gt;&lt;/sup&gt; didn&apos;t just complain, &amp;quot;we&amp;quot; even submitted a patch!&lt;/h4&gt;
&lt;p&gt;Yes, but a maintainer has no obligation to accept any patches coming their way.
They also have no obligation to explain why.&lt;/p&gt;
&lt;h4&gt;&lt;code&gt;Actix&lt;/code&gt; was high up on Some Benchmark, so the maintainers have a bigger responsibility&lt;/h4&gt;
&lt;p&gt;A programmer doesn&apos;t have to defend their code when it&apos;s compared against other code.
If you think, maybe rightfully so, that &lt;code&gt;Actix&lt;/code&gt;s usage of &lt;code&gt;unsafe&lt;/code&gt; made an unfair comparison in a benchmark, then
that&apos;s the fault of the authors of &lt;em&gt;the benchmark&lt;/em&gt;. &lt;em&gt;They&lt;/em&gt; are comparing apples to oranges, in that case.
Of course, if the benchmark is simply &amp;quot;Which web framework written in Rust is fastest?&amp;quot; then this shouldn&apos;t be contested at all, as
&lt;code&gt;Actix&lt;/code&gt; definitely is written in Rust.&lt;/p&gt;
&lt;h4&gt;&lt;code&gt;Actix&lt;/code&gt; was well known in the ecosystem, so the maintainers have a bigger responsibility&lt;/h4&gt;
&lt;p&gt;If The Community have chosen &lt;code&gt;Actix&lt;/code&gt; as a valid representation of how Rust can be used, then so be it.
It does not seem to have been a secret that the author was relaxed when it comes to safe-proofing the code;
when it has become such a high profile project, that seem to speak for itself when it comes to whether Rust
programmers really care about &amp;quot;safety over all&amp;quot;.&lt;/p&gt;
&lt;h4&gt;&lt;code&gt;Actix&lt;/code&gt; was professional-looking, so the maintainers have a bigger responsibility&lt;/h4&gt;
&lt;p&gt;This does not make much sense to me.
Having a polished landing page or &lt;code&gt;README.md&lt;/code&gt; does not somehow magically imply that the project itself is polished and ready for indefinite use by anyone.
The author have not implicitly made a promise on anything related to their code.
Almost any licence clearly say this.&lt;/p&gt;
&lt;h4&gt;But &amp;lt;Maintainer&amp;gt; deleted GitHub issues!&lt;/h4&gt;
&lt;p&gt;As is their right. None of us have any &lt;em&gt;right&lt;/em&gt; to post any kind of content to another persons bug tracker/wiki/&amp;quot;digital property&amp;quot;.
&lt;em&gt;They&lt;/em&gt; are in charge.
If you don&apos;t like it, don&apos;t use it.&lt;/p&gt;
&lt;h4&gt;&lt;code&gt;Actix&lt;/code&gt; was not &amp;quot;sound&amp;quot; and therefore bad&lt;/h4&gt;
&lt;p&gt;I&apos;m not sure I like this craze for soundness.
One of the big selling points of Rust for many people is the safety &lt;em&gt;combined&lt;/em&gt; with the low level control.
However, sometimes these things are simply at odds, and it&apos;s not clear that going with the safety &lt;em&gt;all&lt;/em&gt; the time is really the best way.&lt;/p&gt;
&lt;p&gt;Taken to the extreme, if some library was, for any practical purpose, useless, &lt;em&gt;unless&lt;/em&gt; one did some &lt;code&gt;unsafe&lt;/code&gt; trickery which
also caused some adversarial code to invoke UB, would this library remain useless and safe, or would we just say
&amp;quot;don&apos;t use it that way, then&amp;quot;?&lt;/p&gt;
&lt;p&gt;It seems to me that the consensus among Rust programmers is that any safe code should be safe under &lt;em&gt;any&lt;/em&gt; usage.
I fear this greatly limits the potential of the language and its ecosystem&lt;sup&gt;&lt;a href=&quot;#user-content-fn-st&quot; id=&quot;user-content-fnref-st&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h4&gt;&lt;code&gt;Actix&lt;/code&gt; was not &amp;quot;sound&amp;quot; and it hurts the Rust ecosystem&lt;/h4&gt;
&lt;p&gt;If the ecosystem of a programming language is so fragile that a single project which values aren&apos;t completely aligned with
a large part of The Community is enough to bring it all down, then we should just pack our stuff and go home.
There would be no reason to continue doing any of this, &lt;em&gt;if&lt;/em&gt; Rust reputation depended on that all users of the language
was equally evangelist about it.&lt;/p&gt;
&lt;h4&gt;&amp;lt;Maintainer&amp;gt; should step down and let someone else take over the project&lt;/h4&gt;
&lt;p&gt;This seems to confuse &lt;code&gt;Actix&lt;/code&gt;, which as far as I understand is pretty much one persons library, with a community project.
&lt;code&gt;Actix&lt;/code&gt; does not belong to The Community, and the author has no obligation to The Community to give it to them
when &lt;em&gt;The Community&lt;/em&gt; feels like it.
That &amp;quot;Contributions are welcome&amp;quot; doesn&apos;t change anything at all; it is still not a community project.&lt;/p&gt;
&lt;p&gt;Taking this concept to any other situation reveals how crazy this idea is:
if I invite people over to my house for dinner and tell them that contributions are welcome, they are in no right
to take over my kitchen if they disagree with the way I cook.&lt;/p&gt;
&lt;h4&gt;This behaviour should not be acceptable for any open source maintainer&lt;/h4&gt;
&lt;p&gt;This is simply ridiculous, because it implies that the moment someone publishes FOSS code they are automatically subject
to rules governing how they should deal with the technical decisions in their project.&lt;/p&gt;
&lt;p&gt;It also seem to imply that being a maintainer sets a higher bar to what is social behaviour than merely being a contributor or user.
There should be no differentiation here. Be civil, always.&lt;/p&gt;
&lt;h4&gt;If they don&apos;t care about safety, why are they even using Rust?&lt;/h4&gt;
&lt;p&gt;There are plenty of good reasons to use Rust that does not concern safety at all, as I&apos;m sure any Rust programmer would be able to tell you.
The author of &lt;code&gt;Actix&lt;/code&gt; chose to write Rust. That&apos;s okay, and they does not have to explain why.
This certainly does &lt;em&gt;not&lt;/em&gt; give anyone the right to tell them that they should not be writing Rust.&lt;/p&gt;
&lt;h4&gt;&lt;code&gt;Actix&lt;/code&gt; was labelled production ready when it had soundness bugs!&lt;/h4&gt;
&lt;p&gt;See the other comment about soundness.
I would guess that almost all of the system code running on the device you&apos;re reading this on contain
what you in Rust land would call soundness bugs. Yet, it is very much in production.&lt;/p&gt;
&lt;h4&gt;The &amp;lt;Maintainer&amp;gt; screws over people depending on their code&lt;/h4&gt;
&lt;p&gt;If you have committed to using third party code without any backup plan, you have already screwed over yourself.
For very popular libraries this usually is not a problem, since, in case of maintainer rage-quit there are
a lot of other people with the same needs as you - it&apos;s a kind of safety of the herd.
Ideally, the only change most users would be affected by something like this is that they have
to wait a little longer on the next release, and update the repo url. That&apos;s it.&lt;/p&gt;
&lt;h4&gt;Why are you defending &amp;lt;Maintainer&amp;gt; when they were an asshole?&lt;/h4&gt;
&lt;p&gt;I don&apos;t care who was an asshole or not, and I&apos;m not here to put labels on people;
I just think that &lt;em&gt;most&lt;/em&gt; of the comments I&apos;ve read on this event has had a lot of things in the FOSS world backwards.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;I don&apos;t think a FOSS author has any responsibility over their users.
If you don&apos;t like how they&apos;re prioritizing patches, handling criticism, or tuning to win benchmarks, you don&apos;t have to use it.
Nobody is forcing you to use this code, and if someone is, you have (potentially) valid concerns you can raise.&lt;/p&gt;
&lt;p&gt;If you want to bring in third-party code into your codebase, then that is &lt;em&gt;your&lt;/em&gt; responsibility as a developer.
If you don&apos;t trust the authors you should seriously reconsider using their code for anything more than a weekend project&lt;sup&gt;&lt;a href=&quot;#user-content-fn-web&quot; id=&quot;user-content-fnref-web&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;.
You don&apos;t get to complain if the other developers take down their code&lt;sup&gt;&lt;a href=&quot;#user-content-fn-leftpad&quot; id=&quot;user-content-fnref-leftpad&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;, leave bugs unfixed, refuses patches,
or deletes tickets from their bug tracker.
In fact, you don&apos;t get to say anything at all except &amp;quot;Thank you!&amp;quot;.&lt;/p&gt;
&lt;p&gt;Don&apos;t be an asshole.&lt;/p&gt;
&lt;p&gt;Thanks for reading.
Thoughts and comments are welcome in my &lt;a href=&quot;https://lists.sr.ht/~mht/public-inbox&quot;&gt;public inbox&lt;/a&gt;.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-tweet&quot;&gt;
&lt;p&gt;https://twitter.com/fafhrd91/status/1218135374339301378 &lt;a href=&quot;#user-content-fnref-tweet&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-reddit&quot;&gt;
&lt;p&gt;https://www.reddit.com/r/rust/comments/epzukc/actix_web_repository_cleared_by_author_who_says/ &lt;a href=&quot;#user-content-fnref-reddit&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-hn&quot;&gt;
&lt;p&gt;https://news.ycombinator.com/item?id=22073908 &lt;a href=&quot;#user-content-fnref-hn&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-lobsters&quot;&gt;
&lt;p&gt;https://lobste.rs/s/brcn0w/actix_web_author_i_am_done_with_open_source &lt;a href=&quot;#user-content-fnref-lobsters&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-klabnik&quot;&gt;
&lt;p&gt;https://words.steveklabnik.com/a-sad-day-for-rust &lt;a href=&quot;#user-content-fnref-klabnik&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-raphlinus&quot;&gt;
&lt;p&gt;https://raphlinus.github.io/rust/2020/01/18/soundness-pledge.html &lt;a href=&quot;#user-content-fnref-raphlinus&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-community&quot;&gt;
&lt;p&gt;I still find it strange to have people flock around a tool and call it a community. Each to their own, I guess. &lt;a href=&quot;#user-content-fnref-community&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-st&quot;&gt;
&lt;p&gt;I can&apos;t dig it up now, but I&apos;ve read a blog post about how some C++ concurrency primitive tries to detect whether any other threads have been spawned and takes a fast-path without any synchronization if not.
The author provided an example of this behaviour being faulty, but under adversarial conditions.
In my opinion, this is a perfectly reasonable approach for the library in question to take, since under any normal working condition it would behave as intended. &lt;a href=&quot;#user-content-fnref-st&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-web&quot;&gt;
&lt;p&gt;This is in my opinion one of the last big blockers for proper package management;
it&apos;s sad that the web-of-trust approach somehow didn&apos;t really work for PGP, since that would be the first approach to consider for package management as well. &lt;a href=&quot;#user-content-fnref-web&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-leftpad&quot;&gt;
&lt;p&gt;You know which Javascript library I&apos;m thinking about &lt;a href=&quot;#user-content-fnref-leftpad&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Switching Jobs</title><id>https://mht.wtf/post/job25/</id><updated>2025-11-02T20:30:35+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/job25/" rel=""/><link href="https://mht.wtf/post/job25/index.html" rel="alternate"/><published>2025-11-02T20:30:35+02:00</published><content type="text/html">&lt;p&gt;Last Friday was my last day at &lt;a href=&quot;https://vind.ai&quot;&gt;Vind AI&lt;/a&gt;.
When I joined the company in 2022 it was five months old and we were 7 people,
eveyone included.
It was my first &amp;quot;normal&amp;quot; full-time job.
Today Vind reports having over 1500 registered users and have an impressive logo carousel on their website.
It&apos;s been a lot of fun to help build a company from the very early days,
when it could feel like we were &amp;quot;just some people doing something&amp;quot;,
to it feeling like a &lt;em&gt;real company&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Tomorrow I&apos;m starting my new job as a senior software engineer at &lt;a href=&quot;https://www.cognite.com&quot;&gt;Cognite&lt;/a&gt;!
I&apos;m excited for this next step in my career, and hope that the next three years will be
as fun as the last.&lt;/p&gt;
</content></entry><entry><title>Copilot</title><id>https://mht.wtf/post/copilot/</id><updated>2023-04-05T11:29:43+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/copilot/" rel=""/><link href="https://mht.wtf/post/copilot/index.html" rel="alternate"/><published>2023-04-05T11:29:43+02:00</published><content type="text/html">&lt;p&gt;I have been playing around with LLMs for the past couple of days and decided to try out &lt;a href=&quot;https://github.com/features/copilot&quot;&gt;GitHub Copilot&lt;/a&gt;.
Overall, the experience so far has been similar to that of ChatGPT, in that sometimes it&apos;s helpful, and sometimes it&apos;s not.
It pretty much always require some tweaking, so you have to know what you&apos;re after, and you have to be able to check that its suggestions are actually any good because it&apos;ll often be wrong, and sometimes in subtle ways.&lt;/p&gt;
&lt;p&gt;Anyways, something funny happened.
I was trying to see if Copilot would also generate natural text unrelated to surrounding code when prompted to do so in a comment. I tried&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;# Write a funny prompt mimicing youtube personalities asking for more subscribers:
# 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and Copilot suggested&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;# Write a funny prompt mimicing youtube personalities asking for more subscribers:
# https://www.youtube.com/watch?v=9bZkp7q19f0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
</content></entry><entry><title>A Checkmate Poster</title><id>https://mht.wtf/post/checkmateposter/</id><updated>2020-09-20T16:50:39+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/checkmateposter/" rel=""/><link href="https://mht.wtf/post/checkmateposter/index.html" rel="alternate"/><published>2020-09-20T16:50:39+02:00</published><content type="text/html">&lt;p&gt;No affiliation with checkmateposters.com blabla&lt;/p&gt;
&lt;p&gt;Some weeks ago I saw checkmateposters.com on HN(?).
The concept is very simple: you generate a poster showing chess positions
thoughout a game of your choosing. Colors are also, to some degree, customizable.
After considering getting one for a friends birthday, I rather got one for myself,
both since their birthday was a little further in the future, I wasn&apos;t sure they&apos;d like it, and hey, what if the poster is not any good?&lt;/p&gt;
&lt;p&gt;And besides, I wasn&apos;t sure which game to get.&lt;/p&gt;
&lt;p&gt;At the same time I read Brian Kernighan&apos;s &amp;quot;Unix: A History and a Memoir&amp;quot;,
in which he mentions a chess game in between the programs &lt;code&gt;Blitz 6.5&lt;/code&gt; and &lt;code&gt;Belle&lt;/code&gt;,
Belle which was co-authored by Ken Thompson.
In the book the game was only covered in annotated FEN,
so I had to play it out on a board to see how it looks (my mental FEN skills are
obviously not up to par).&lt;/p&gt;
&lt;p&gt;Some days ago it arrived and I figured I&apos;d post some pictures of it since
(a) it&apos;s a pretty new service so new potential customers will have a hard time evaluating the product they&apos;re buying, and
(b) I was very happy with the result, and don&apos;t mind posting about services that I enjoy.&lt;/p&gt;
&lt;h2&gt;The Shipping&lt;/h2&gt;
&lt;p&gt;I ordered the poster the 3rd of September, and it arrived here the 15th.
I got three emails in between:
one order confirmation which contained the shipping information (i.e. my address) and a low-res thumbnail of the poster;
one update email from Avery on the 8th, saying they&apos;ll forward the tracking information as soon as they get it;
one containing the link to the tracking info.&lt;/p&gt;
&lt;p&gt;I think it&apos;s unfortunate that I only got a low-res picture of the poster in the confirmation, since I couldn&apos;t send pictures
to friends saying &amp;quot;look what I just ordered&amp;quot;.
Furthermore, it would also have been nice if the colors (and FEN even) was included in the mail, since I now
have no way of knowing exactly which colors I chose.
This isn&apos;t a problem for me right now, but you could imaginge wanting to order either a copy or a new game in the same style
as before. Unless you chose either of the presets it seems that you don&apos;t really have a way of buying the same again.&lt;/p&gt;
&lt;p&gt;Here&apos;s the box I got:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./box.jpg&quot; alt=&quot;The box&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Inside the box the poster was wrapped in soft-ish wrapping paper:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./wrapped.jpg&quot; alt=&quot;Poster wrapped in wrapping paper&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;The Poster&lt;/h2&gt;
&lt;p&gt;Here&apos;s the poster itself, with a 0.5 liter bottle for scale.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./bottle.jpg&quot; alt=&quot;The paper next to a bottle for scale&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The paper is semi thick, and feels pretty good.
Here&apos;s a closeup picture of the poster; it looks a little blurry because (a) my camera isn&apos;t great, and (b) it &lt;em&gt;is&lt;/em&gt; a little blurry.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./closeup.jpg&quot; alt=&quot;A closeup picture of the poster&quot; /&gt;&lt;/p&gt;
&lt;p&gt;It&apos;s difficult to get an idea of the print quality by looking at a picture of the poster itself, since it&apos;s
hard to differentiate blur in the print and blur in the photo.
I&apos;ve tried to make this a little easier with a side-by-side comparison with a paper I had at hand.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./paper-comparison.jpg&quot; alt=&quot;A paper and the poster for sharpness comparison&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Looking at the picture, the difference isn&apos;t too big, although I think it&apos;s more in real life;
I attribute this to the phone camera.
Still, this is only really noticable if you are pretty close to the poster.
At an arms length away it is not noticably blurry.&lt;/p&gt;
&lt;h2&gt;Framing The Poster&lt;/h2&gt;
&lt;p&gt;I wanted to frame the poster to get some constrast between my wall and the poster itself since I chose a white background.
The dimensions of the poster is found on the webpage: 18&apos; by 24&apos; &lt;sup&gt;&lt;a href=&quot;#user-content-fn-inch&quot; id=&quot;user-content-fnref-inch&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.
Living in a SI country, this is slightly unfortunate since the dimensions aren&apos;t as nice in meters.&lt;/p&gt;
&lt;p&gt;Furthermore, when framing you will probably end up covering parts of the poster with the &lt;em&gt;mat&lt;/em&gt;.
Unfortunately, it&apos;s not clear exactly how much space you have in between the border of the poster and where the chess boards
start.
In my case I had pretty little space, and the parts of the poster I didn&apos;t want to hide was slightly wider than
the mat from the frame I got. I therefore had to cut in it, which is somewhat visible on the pictures.&lt;/p&gt;
&lt;p&gt;It would be nice if this was somehow taken into account when the poster is generated, e.g. that you could get exact dimensions
for the different parts of the poster so that you would know beforehand exactly how big the frame and mat would have to be.
Still, this was fairly easy to get around.&lt;/p&gt;
&lt;p&gt;Here&apos;s the poster, framed and standing on a chair:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./frame-chair.jpg&quot; alt=&quot;Framed poster on a chair&quot; /&gt;&lt;/p&gt;
&lt;p&gt;You can tell that I didn&apos;t want to cut too much into the mat, as the borders of the boards are just slightly within the mat.
You can also see my terrible paper cutting skills, especially on the right side of the mat.&lt;/p&gt;
&lt;p&gt;Here&apos;s the poster, framed up on my wall.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./frame-wall.jpg&quot; alt=&quot;The framed poster hanging on a wall&quot; /&gt;
&lt;img src=&quot;./frame-wall2.jpg&quot; alt=&quot;The framed poster seen from further away&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;End&lt;/h2&gt;
&lt;p&gt;In total, I&apos;m very happy with the poster I got, and I think the minor hiccups I had (or thought of) are
easily fixable if needed, and not really a dealbreaker if left as is.&lt;/p&gt;
&lt;p&gt;I hope this either inspired you into getting some new wall decoration, or helped you in deciding whether to buy a poster or not.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-inch&quot;&gt;
&lt;p&gt;well, the website says 18&apos; x 24&apos;, which as far as I can tell, not being american, would mean 28 by 24 &lt;em&gt;feet&lt;/em&gt;, and not inches.  I don&apos;t think it&apos;s really possible to misunderstand this though, considering there&apos;s a big picture of the poster right on the front page, as well as the fact that 24 feet is a lot. &lt;a href=&quot;#user-content-fnref-inch&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>WTFs in Floating Point Math</title><id>https://mht.wtf/post/floating-precision/</id><updated>2016-07-17T19:33:54+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/floating-precision/" rel=""/><link href="https://mht.wtf/post/floating-precision/index.html" rel="alternate"/><published>2016-07-17T19:33:54+02:00</published><content type="text/html">&lt;p&gt;We all know that floating point nubmers have a limited precision.
With small numbers, say between zero and one, we can describe the fractional part of the number in great detail.
However, if the number is large, there may not be room for any fractional part.
So how large numbers do we need before &lt;code&gt;a == a + 1&lt;/code&gt;?&lt;/p&gt;
&lt;p&gt;We&apos;ll find our answer by simply checking:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;stdio.h&amp;gt;

int simple() {
  float num = 1.0f;
  float inc = num + 1;

  while (num != inc) {
    num *= 10;
    inc = num + 1;
  }
  printf(&amp;quot;%f\n&amp;quot;, num);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which outputs &lt;code&gt;100000000.000000&lt;/code&gt;.
If we multiply by &lt;code&gt;2&lt;/code&gt; insead of &lt;code&gt;10&lt;/code&gt;, we may get a more intuitive number: &lt;code&gt;16777216 = 2^24&lt;/code&gt;.
If we look at the &lt;a href=&quot;https://en.wikipedia.org/wiki/Single-precision_floating-point_format#IEEE_754_single-precision_binary_floating-point_format:_binary32&quot;&gt;IEEE 754 single-precision floating-point format&lt;/a&gt;,
we see that the fraction part uses only &lt;code&gt;24&lt;/code&gt; bits;
our number &lt;code&gt;2^24&lt;/code&gt; needs &lt;code&gt;25&lt;/code&gt; bits, so we can safely add one, because the last bit gets truncated.&lt;/p&gt;
&lt;p&gt;How high can we go if we are using &lt;code&gt;double&lt;/code&gt; instead of &lt;code&gt;float&lt;/code&gt;?
As IEEE 754 double-presicion uses &lt;code&gt;53&lt;/code&gt; bits, we are expecting to get to &lt;code&gt;2^53 = 9007199254740992&lt;/code&gt;, or
&lt;code&gt;10000000000000000&lt;/code&gt; if we multiply by &lt;code&gt;10&lt;/code&gt;.
And that is what we get.&lt;/p&gt;
&lt;h1&gt;Adding more than one&lt;/h1&gt;
&lt;p&gt;How much can we add before getting a new number?
One would think the answer was simply until the addition carries over to the &lt;code&gt;24&lt;/code&gt; first bits.
However, the answer is not that straight forward.
Instead, IEEE has defined &lt;a href=&quot;https://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules&quot;&gt;Rounding rules&lt;/a&gt;, for deciding what to do when the number we would like to
represent doesn&apos;t quite fit in &lt;code&gt;24&lt;/code&gt; bits.
The default rule is &lt;em&gt;«Round to nearest, ties to even»&lt;/em&gt;, meaning if we are in between two numbers, round to the even number.&lt;/p&gt;
&lt;p&gt;Let&apos;s try this out with the totaly random number &lt;code&gt;314159256&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;314159265 = 10010101110011011000010100001
            |------- 24 bits ------||---|
// The last 5 bits will be truncated
// .. so the 16 numbers from
314159264 = 10010101110011011000010100000
// to
314159279 = 10010101110011011000010101111
// are all the same.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is fairly easy to confirm:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;void confirm_suspicions(void) {
  double first = 314159264;
  double number = first;
  while (((float) first) == ((float) number)) {
    number++;
  }
  printf(&amp;quot;%f != %f\n&amp;quot;, number, first);
  // 314159280 != 314159264
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we however were to set the fifth bit, the rounding rule would round all the way up to &lt;code&gt;314156296&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;314159280 = 10010101110011011000010110000
            |------- 24 bits ------||---|
// Round up, so the least significant bit of the 24 bits becomes 0
314159296 = 10010101110011011000011000000
// This will be the same number all the way up to
314159312 = 10010101110011011000011010000
// .. where the rounding rule allows us to include ..10000, as the lsb is 0.
// This makes 33 different numbers, all represented as the same floating point number.
&lt;/code&gt;&lt;/pre&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;What can we take home from this?
First of all, floating point numbers doesn&apos;t behave like real numbers, and its easy to forget this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;float a,b;
// ...
a = b;
a++;
a == b // true ???
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or what about this?&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;float a;
// ...
for (; a &amp;lt; large_float; a++) {
  // ...
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which is an infinite loop, if &lt;code&gt;a == (a + 1)&lt;/code&gt;.
Lastly, consider this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;void oops(void) {
  float a = 314159264;
  float c = a + 10 + 10;
  printf(&amp;quot;%d\n&amp;quot;, c == 314159284.f); // 0
  float d = a + (10 + 10);
  printf(&amp;quot;%d\n&amp;quot;, d == 314159284.f); // 1
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And you thought addition was associative? Not in the world of floating point numbers!&lt;/p&gt;
</content></entry><entry><title>Auto-optimizing an algorithm</title><id>https://mht.wtf/post/edit-distance/</id><updated>2026-03-09T22:41:51+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/edit-distance/" rel=""/><link href="https://mht.wtf/post/edit-distance/index.html" rel="alternate"/><published>2026-03-09T22:41:51+01:00</published><content type="text/html">&lt;p&gt;After I found that &lt;a href=&quot;/ai26&quot;&gt;llms are useful&lt;/a&gt;, I wanted to get a better feel for what they are useful for.
Clearly, they are good at generating boilerplate and writing bash one-liners, but what else?&lt;/p&gt;
&lt;p&gt;I wanted to see how difficult it was to set up an agent loop
that has some objective that it can evaluate autonomously, let it run and see what happens.
Also, I wanted &lt;em&gt;some&lt;/em&gt; idea what was happening in the process, but without human intervention.
Code optimization is an example of something with a lot of interesting properties:
it&apos;s easy to test for correctness, and has a (fairly) clear objective function&lt;sup&gt;&lt;a href=&quot;#user-content-fn-obj&quot; id=&quot;user-content-fnref-obj&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;So which code should we optimize?
I figured I&apos;d start at a blank slate, choose a problem that
could have some different good solutions, that is easy to manually test or verify (if needed),
and that I sort-of know from before.&lt;/p&gt;
&lt;p&gt;After some brainstorming with the machine I chose the &lt;a href=&quot;https://en.wikipedia.org/wiki/Levenshtein_distance&quot;&gt;Levenshtein edit distance&lt;/a&gt;.
Given two strings &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;, how many edits do you have to make to turn &lt;code&gt;a&lt;/code&gt; into &lt;code&gt;b&lt;/code&gt;?
One edit is a single insertion, a single removal, or replacing a single character,
and we want to find the minimal number of edits.
We don&apos;t care what the edits actually are.&lt;/p&gt;
&lt;h2&gt;The Reference Solution&lt;/h2&gt;
&lt;p&gt;I guess you could brute force this and go through a search three trying all combinations,
but that would be very inefficient.
The standard &amp;quot;proper&amp;quot;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-dp&quot; id=&quot;user-content-fnref-dp&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; solution uses dynamic programming:
we create a table &lt;code&gt;dist[i][j]&lt;/code&gt; that computes the edit-distance between &lt;code&gt;a[..i]&lt;/code&gt; and &lt;code&gt;b[..j]&lt;/code&gt;,
and build this table bottom-up.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;dist[0][k] == dist[k][0] == k&lt;/code&gt; because to go from the empty string to any other string
you have to insert each character.
This is the base-case.&lt;/p&gt;
&lt;p&gt;For the other entries there are three cases:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;remove a character: cost &lt;code&gt;dist[i-1][j] + 1&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;add a character: cost &lt;code&gt;dist[i][j-1] + 1&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;replace: cost &lt;code&gt;dist[i-1][j-1]&lt;/code&gt;, plus &lt;code&gt;1&lt;/code&gt; if they are different and &lt;code&gt;0&lt;/code&gt; otherwise.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Confused yet? Here&apos;s a half-filled out table:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;center&quot;&gt;&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;.&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;a&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;g&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;e&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;n&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;t&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;.&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;0&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;1&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;2&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;3&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;4&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;p&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;1&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;1&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;2&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;3&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;4&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;a&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;2&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;1&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;?&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;g&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;3&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;a&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;4&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;n&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;5&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Finding &lt;code&gt;?&lt;/code&gt; in the table is computing &lt;code&gt;edit(&amp;quot;pa&amp;quot;, &amp;quot;ag&amp;quot;)&lt;/code&gt;.
We have some options:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Pay &lt;code&gt;1&lt;/code&gt; to remove the &lt;code&gt;&apos;a&apos;&lt;/code&gt; in &lt;code&gt;&amp;quot;pa&amp;quot;&lt;/code&gt; and then do the rest of the edits: &lt;code&gt;edit(&amp;quot;p&amp;quot;, &amp;quot;ag&amp;quot;) + 1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Do the edit from &lt;code&gt;&amp;quot;pa&amp;quot;&lt;/code&gt; to just &lt;code&gt;&amp;quot;a&amp;quot;&lt;/code&gt; and then insert the &lt;code&gt;&apos;g&apos;&lt;/code&gt;: &lt;code&gt;edit(&amp;quot;pa&amp;quot;, &amp;quot;a&amp;quot;) + 1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Replace the last charater and edit the rest: &lt;code&gt;edit(&amp;quot;p&amp;quot;, &amp;quot;a&amp;quot;) + 1&lt;/code&gt;. We get the &lt;code&gt;+1&lt;/code&gt; part since the chars doesn&apos;t match.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We only need to consider all three and pick the cheapest:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;edit(&amp;quot;p&amp;quot;, &amp;quot;ag&amp;quot;) + 1 = dist[1][2] + 1 == 3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;edit(&amp;quot;pa&amp;quot;, &amp;quot;a&amp;quot;) + 1 = dist[2][1] + 1 == 2&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;edit(&amp;quot;p&amp;quot;, &amp;quot;a&amp;quot;)  + 1 = dist[1][1] + 1 == 2&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In this case is a tie:
for the full edits,
we can either go &lt;code&gt;&amp;quot;ag&amp;quot; -&amp;gt; &amp;quot;aa&amp;quot; -&amp;gt; &amp;quot;pa&amp;quot;&lt;/code&gt; where we replace both times,
or we can go &lt;code&gt;&amp;quot;ag&amp;quot; -&amp;gt; &amp;quot;a&amp;quot; -&amp;gt; &amp;quot;pa&amp;quot;&lt;/code&gt; where we first delete and then insert.&lt;/p&gt;
&lt;p&gt;I don&apos;t know, it&apos;s kinda confusing, and very tempting to frame as
peeling off characters on both strings until nothing is left, &lt;code&gt;dist[0][0]&lt;/code&gt;,
but then we have to show that this solves the same problem.
Computing the table, though, is now super easy. Look at the three neighbors
to the top left and add one to the smallest of those. If the letters match
at your position, you get to choose the diagonal one for free.&lt;/p&gt;
&lt;p&gt;Here&apos;s the filled out table with arrows showing a path we can take for the optimal cost of &lt;code&gt;3&lt;/code&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;center&quot;&gt;&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;.&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;a&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;g&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;e&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;n&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;t&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;.&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;0&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←1&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←2&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←3&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←4&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;p&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↑1&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↖1&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←2&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←3&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←4&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;a&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↑2&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↖1&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←2&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←3&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←4&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;g&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↑3&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↑2&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↖1&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←2&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←3&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;a&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↑4&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↑3&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↑2&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↖2&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←3&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;n&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↑5&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↑4&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↑3&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↑3&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;↖2&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;←3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;code&gt;agent -&amp;gt; agen -&amp;gt; agan -&amp;gt; pagan&lt;/code&gt;&lt;/p&gt;
&lt;h2&gt;Setup&lt;/h2&gt;
&lt;p&gt;Look, this isn&apos;t actual science, and you&apos;re not my boss, so I can do what I want.
Or rather, I don&apos;t have to do things I don&apos;t want to do.
And I didn&apos;t want to write tests or benchmarks, or even think too hard about
edge-cases or input distributions.
So instead of designing a test suite and a benchmark system I first had agents generate these too.&lt;/p&gt;
&lt;p&gt;Also, the agent generated the reference implementation:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;// THIS IS PURE LLM GENERATED CODE
int ref_edit_distance(
    const unsigned char *s,
    int s_len, 
    const unsigned char *t,
    int t_len
) {
  if (s_len == 0) return t_len;
  if (t_len == 0) return s_len;

  int *prev = (int *)malloc((t_len + 1) * sizeof(int));
  int *curr = (int *)malloc((t_len + 1) * sizeof(int));

  for (int j = 0; j &amp;lt;= t_len; j++)
    prev[j] = j;

  for (int i = 1; i &amp;lt;= s_len; i++) {
    curr[0] = i;

    for (int j = 1; j &amp;lt;= t_len; j++) {
      int cost = (s[i - 1] == t[j - 1]) ? 0 : 1;

      int del = prev[j] + 1;
      int ins = curr[j - 1] + 1;
      int sub = prev[j - 1] + cost;

      int d = del;
      if (ins &amp;lt; d) d = ins;
      if (sub &amp;lt; d) d = sub;
      curr[j] = d;
    }

    int *tmp = prev;
    prev = curr;
    curr = tmp;
  }

  int r = prev[t_len];
  free(prev);
  free(curr);
  return r;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It has one funny trick, namely that instead of computing the whole table it
computes one row at a time and switches which of the two rows it looks at
and which is writes to.
In iteration &lt;code&gt;i&lt;/code&gt;, &lt;code&gt;prev&lt;/code&gt; is row &lt;code&gt;i-1&lt;/code&gt; and &lt;code&gt;curr&lt;/code&gt; is row &lt;code&gt;i&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Setting up the harness was a little tricy because I tried to over-engineer it at first,
with sandboxing and custom tools, trying to limit the exposure the agent had
to the benchmarks and previous versions.
In the end I used &lt;a href=&quot;https://pi.dev/&quot;&gt;&lt;code&gt;pi&lt;/code&gt;&lt;/a&gt; and wrote separate agents for separate stages
of the process: one planner, one implementor, one reviewer, and the &amp;quot;top&amp;quot; model to orchestrate.
A pretty standard setup, except that the implementor wasn&apos;t allowed to run the benchmarks.&lt;/p&gt;
&lt;p&gt;The setup was such that the planner wrote an hypothesis, the implementor created a submission that passed
the test suite,
the orchestrator ran the benchmarks that spat out a json file with wall-clock time and &lt;code&gt;perf&lt;/code&gt; numbers,
and the reviewer looked at those and wrote a conclusion.
Then it updated a global &lt;a href=&quot;FINDINGS.md&quot;&gt;&lt;code&gt;FINDINGS.md&lt;/code&gt;&lt;/a&gt; so that the planner in the next iteration
wouldn&apos;t have to read too much to make a new plan.&lt;/p&gt;
&lt;p&gt;I also made sure that the agents wrote plans to communicate so that I could store these, in case they
would be interesting.
&lt;code&gt;pi&lt;/code&gt; can probably enforce this with an extension or something, but I didn&apos;t bother trying to figure out how.&lt;/p&gt;
&lt;p&gt;I also found some third party code that looked like it would be a good reference and
ran the benchmark on those.&lt;/p&gt;
&lt;h2&gt;Results&lt;/h2&gt;
&lt;p&gt;Short version: llms are capabable of optimizing code when given a harness like this.
Here&apos;s some timings.
Benchmark names have their size, alphabet size, and mutation rate, in the name.
IDs are incremental and &lt;code&gt;#99x&lt;/code&gt; are external implementations.
It&apos;s sorted on the middle column.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;center&quot;&gt;ID&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;256α4m30&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;1Kα26m15&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;4Kα4m10&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;16K-asym&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;32Kα4m80&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#991&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;12.6 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;88.6 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;343.0 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;7.34 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;72.18 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#992&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;25.7 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;123.3 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;607.7 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;224.47 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.26 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#020&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;54.9 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;687.1 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;5.78 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;40.56 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#016&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;55.9 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;698.1 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;5.91 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;43.25 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#019&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;54.3 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;701.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;5.77 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;40.14 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#018&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;60.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;702.7 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;6.20 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;44.69 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#017&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;55.3 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;727.0 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;5.90 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;40.49 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#015&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;5.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;60.5 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;793.9 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;6.51 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;46.40 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#012&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;4.7 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;57.8 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;852.3 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;6.77 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;52.32 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#013&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.8 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;60.0 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;855.1 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;6.78 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;52.53 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#014&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.8 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;59.7 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;883.0 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;6.90 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;58.18 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#004&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;4.8 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;64.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;947.0 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;8.19 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;58.72 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#010&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;4.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;65.1 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;973.2 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;7.62 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;59.09 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#003&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;4.7 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;64.2 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.02 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;7.96 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;60.26 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#011&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.8 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;72.1 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.05 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;8.29 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;64.83 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#009&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.8 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;70.6 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.05 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;8.57 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;64.31 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#007&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;4.8 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;67.8 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.08 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;8.19 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;67.41 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#008&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;5.8 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;74.1 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.11 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;8.47 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;66.10 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#006&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;5.3 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;70.0 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.12 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;8.56 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;67.15 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#002&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;7.9 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;85.0 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.12 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;11.04 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;104.66 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#001&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;6.8 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;86.0 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.27 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;10.10 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;87.12 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#005&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;7.2 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;91.2 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.35 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;10.79 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;85.71 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;#993&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;97.0 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.50 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;23.65 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;204.17 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.57 s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Here&apos;s a table of three good versions, one of which is external.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;left&quot;&gt;Case&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;#016&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;#020&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;#991&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;256-sym-alpha4-mut10&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;3.4 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.5 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;7.4 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;256-sym-alpha4-mut30&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;3.4 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;12.6 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;256-sym-alpha26-mut15&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;3.3 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;15.0 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;256-sym-alpha256-mut15&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;3.3 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;59.3 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;256-asym-128x256&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.9 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;1.9 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;9.0 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;256-sym-alpha4-mut80&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;3.4 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;16.6 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;1k-sym-alpha4-mut10&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;48.3 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;48.9 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;50.3 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;1k-sym-alpha4-mut30&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;50.2 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;48.7 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;94.2 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;1k-sym-alpha26-mut15&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;55.9 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;54.9 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;88.6 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;1k-sym-alpha256-mut15&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;39.8 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;37.8 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;351.1 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;1k-asym-512x1024&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;30.1 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;30.1 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;55.8 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;1k-sym-alpha4-mut80&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;49.7 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;49.7 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;127.6 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;4k-sym-alpha4-mut10&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;698.1 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;687.1 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;343.0 µs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;4k-sym-alpha4-mut30&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;696.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;688.2 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;833.9 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;4k-sym-alpha26-mut15&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;783.6 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;763.0 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;549.1 µs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;4k-sym-alpha256-mut15&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;462.5 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;432.8 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.56 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;4k-asym-2048x4096&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;409.2 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;381.4 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;546.3 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;4k-sym-alpha4-mut80&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;759.4 µs&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;694.8 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.39 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;8k-sym-alpha4-mut10&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;2.99 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;2.71 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;918.1 µs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;8k-sym-alpha4-mut30&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;2.96 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;2.73 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;2.77 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;8k-sym-alpha26-mut15&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.10 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.02 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;1.62 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;8k-sym-alpha256-mut15&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.72 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;1.66 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;3.47 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;8k-asym-4096x8192&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;1.46 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;1.49 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;2.03 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;8k-sym-alpha4-mut80&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;2.71 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;2.72 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;5.15 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;16k-sym-alpha4-mut10&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;10.54 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;10.83 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;2.67 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;16k-sym-alpha4-mut30&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;10.61 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;10.70 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;10.01 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;16k-sym-alpha26-mut15&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;11.99 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;11.97 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;5.28 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;16k-sym-alpha256-mut15&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;6.45 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;6.52 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;9.13 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;16k-asym-8192x16384&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;5.91 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;5.78 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;7.34 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;16k-sym-alpha4-mut80&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;10.51 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;10.48 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;18.68 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;32k-sym-alpha4-mut10&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;41.42 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;42.74 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;9.61 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;32k-sym-alpha4-mut30&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;42.15 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;43.96 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;37.91 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;32k-sym-alpha26-mut15&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;49.18 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;46.96 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;17.65 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;32k-sym-alpha256-mut15&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;24.20 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;24.47 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;26.09 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;32k-asym-16384x32768&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;22.82 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;23.11 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;27.75 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;32k-sym-alpha4-mut80&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;43.25 ms&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;40.56 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;72.18 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;left&quot;&gt;&lt;strong&gt;Total Wins&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;8&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;18&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;10&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Some of these differences are close to the benchmark noise, but the generated code
is undeniably faster on certain patterns of inputs compared to the external &lt;code&gt;#991&lt;/code&gt;.
The flipside is also true.&lt;/p&gt;
&lt;p&gt;The code is pretty bad.
It&apos;s very typical &amp;quot;highly-optimized-and-butt-ugly&amp;quot; code.
There&apos;s no abstractions pulled out, apart from some functions that are
optimized for smaller string lengths and marked to go in certain sections of the &lt;code&gt;elf&lt;/code&gt;.
In other words, no care was taken into constructing a program that makes sense.
But, the llm has no problem explaining what&apos;s going on, and with some external help
it&apos;s very doable to sit down and work through it, just as you would have to do
if you wanted to understand any other code that you didn&apos;t write.&lt;/p&gt;
&lt;p&gt;As it was running, I was looking at the output every now and then and
saw the agent run &lt;code&gt;objdump&lt;/code&gt; on the target executable to confirm that certain regions
got auto-vectorized properly.
With the right tools, I think it could have done way better;
what if it could easily see how instructions are scheduled on the execution ports to detect backend-pressure?
These tools probably do exist, but neither I nor the agent seemed to know them.&lt;/p&gt;
&lt;p&gt;What&apos;s useful for humans is also often useful for agents.
I hope this wave of llms will both lower the bar and raise the leverage to create good tools that
humans and agents can use in order to understand the systems we create.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;hr /&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-obj&quot;&gt;
&lt;p&gt;Creating a benchmark isn&apos;t all that easy, because the shape of the inputs can have a huge impact on the performance of the system. As is the case in this benchmark! &lt;a href=&quot;#user-content-fnref-obj&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-dp&quot;&gt;
&lt;p&gt;The properness of this solution depends on how comfortable you are with DP, I guess. &lt;a href=&quot;#user-content-fnref-dp&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Negative Comments</title><id>https://mht.wtf/post/negative-comments/</id><updated>2020-06-12T18:47:04+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/negative-comments/" rel=""/><link href="https://mht.wtf/post/negative-comments/index.html" rel="alternate"/><published>2020-06-12T18:47:04+02:00</published><content type="text/html">&lt;p&gt;This post is a stream of conciousness from reading comments at &lt;a href=&quot;https://news.ycombinator.com/item?id=23497236&quot;&gt;this&lt;/a&gt; post.
More specifically, this comment got me thinking&lt;sup&gt;&lt;a href=&quot;#user-content-fn-a&quot; id=&quot;user-content-fnref-a&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You know no-one is forcing you to play CS in your browser, right? Why is it so offensive to you that this exists and someone else is finding joy in playing it? Why does HN love to rag on web technologies so much?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The assumption here is that in some places, like HN, it&apos;s not unusual for projects to be, for the lack of a better word, shat on.
I too have experienced this when I started, and unfortunately didn&apos;t finish, a series on writing a JPEG encoder/decoder in Rust.
People were complaining among other things that rewriting something in A New And Different Language didn&apos;t solve a real problem.
They were competely right, of course, but they also missed the point completely, as I didn&apos;t really need a new encoder/decoder of JPEG files.
It was strictly educational, as I wanted to learn more about JPEG, as well as experiencing how it was trying to implement a spec.&lt;/p&gt;
&lt;p&gt;Back then I felt the criticism was stupid.
Hadn&apos;t they read the post at all?
How could they really think I did this for some technical gain?
My own feelings were at the time backed up by the general consensus on various sites.&lt;/p&gt;
&lt;p&gt;Still, looking back, I think their frustration was warranted, and I think the same frustration is surfacing on the CS 1.6 thread from HN linked above.&lt;/p&gt;
&lt;p&gt;The problem isn&apos;t that some people took the time to to a thing that they&apos;re proud of, and posted it to some news aggregation site.&lt;/p&gt;
&lt;p&gt;The problem is that these posts are celebrated by the community when their only contribution is &amp;quot;oh, that&apos;s neat&amp;quot;.&lt;/p&gt;
&lt;p&gt;Our attention is limited, and literally with mankinds full knowledge at our fingertips our attention is one of the most important resources we have.
Therefore, we &lt;em&gt;must&lt;/em&gt; be frugal with it.
When people are getting upset that some silly hobby project is invading their news feed, it&apos;s not because they were forced to look at it, or because they thought the project was stupid and not worth doing.
I think it&apos;s mainly because it shows that their peers are more interested in neat hacks as opposed to more meaningful content.&lt;/p&gt;
&lt;p&gt;You could argue that &lt;em&gt;hacker&lt;/em&gt; news is the natural place for neat hacks, and I do agree, and this problem is of course not isolated to HN.
However, I think the frustration comes from a deeper level where we are concerned that we as a community, are getting lost in cheap flashy tricks instead of sound solid concepts and ideas.&lt;/p&gt;
&lt;p&gt;You only have about 16 hours of attention every day&lt;sup&gt;&lt;a href=&quot;#user-content-fn-b&quot; id=&quot;user-content-fnref-b&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, and where you spend this time is paramount.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-a&quot;&gt;
&lt;p&gt;In a sense, this is really the perfect post for these thoughts I&apos;ve had. For people who don&apos;t know, Counter Strike is a highly competitive first person shooter game focused on fast paced combat and shooting accuracy. The professional players, because there are professional players, play with monitors which refresh rates are either &lt;a href=&quot;https://csgopedia.com/csgo-pro-setups/&quot;&gt;144Hz or 240Hz&lt;/a&gt;, which amounts to 6.94ms and 4.17ms per frame, respectively. Putting this game in a web browser, a program known for being bloated and sloppy (although they have their reasons!), is not because it is viable in any sense, but stricktly because &amp;quot;here&apos;s a game people know, and look! we can play it in the browser!&amp;quot; &lt;a href=&quot;#user-content-fnref-a&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-b&quot;&gt;
&lt;p&gt;Assuming you are focusing on &lt;em&gt;something&lt;/em&gt; every single hour of your day, of course ;) &lt;a href=&quot;#user-content-fnref-b&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Other quotes from Structured Programming with go to Statements</title><id>https://mht.wtf/post/structured-programming-quotes/</id><updated>2019-02-25T11:31:54+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/structured-programming-quotes/" rel=""/><link href="https://mht.wtf/post/structured-programming-quotes/index.html" rel="alternate"/><published>2019-02-25T11:31:54+01:00</published><content type="text/html">&lt;p&gt;Most of us have heard the (unfortunately) famous quote from Donald E. Knuths
1974 paper &amp;quot;Structured Programming with &lt;code&gt;go to&lt;/code&gt; Statements&amp;quot;. Yes, it&apos;s probably
the one you&apos;re thinking about&lt;sup&gt;&lt;a href=&quot;#user-content-fn-quote&quot; id=&quot;user-content-fnref-quote&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. It&apos;s a really good paper with plenty of
interesting ideas that holds up very well especially considering the paper is
over 40 years old and is in great part about optimization. I highly recommend
you to stop reading my blog, and to go and read the paper itself instead.&lt;/p&gt;
&lt;p&gt;What follows is a list of other great quotes from the same paper. They are
mostly copied straight from the paper, but I&apos;ve occasionally omitted parts in
order not to make them too long or filled with irrelevant context. My own edits
are written &lt;em&gt;[like this]&lt;/em&gt;. Enjoy!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;[...]&lt;/em&gt; people are now beginning to renounce every feature of programming that
can be considered guilty by virtue of its association with difficulties.  Not
only &lt;code&gt;go to&lt;/code&gt; statements are being questioned; we also hear complaints about
floating-point calculations, global variables, semaphores, pointer variables,
and even assignment statements.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The improvement in speed from Example 2 to Example 2a is only about 12%, and
many people would pronounce that insignificant. The conventional wisdom shared
by many of today&apos;s software engineers calls for ignoring efficiency in the
small; but I believe this is simply an overreaction to the abuses they see
being practiced by penny-wise-and-pound-foolish programmers, who can&apos;t debug or
maintain their &amp;quot;optimized&amp;quot; programs. In established engineering disciplines a
12% improvement, easily obtained, is never considered marginal; and I believe
the same viewpoint should prevail in software engineering. Of course I
wouldn&apos;t bother making such optimizations on a one-shot job, but when it&apos;s a
question of preparing quality programs, I don&apos;t want to restrict myself to
tools that deny me such efficiencies.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;I&apos;ve become convinced that all compilers written from now on should be designed
to provide all programmers with feedback indicating what parts of their
programs are costing the most; indeed, this feedback should be supplied
automatically unless it has been specifically turned off.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;He &lt;em&gt;[Tony Hoare]&lt;/em&gt; points out quite correctly that the current practice of
compiling subscript range checks into the machine code while a program is being
tested, then suppressing the check during production runs, is like a sailor who
wears his life preserver while training on land but leaves it behind when he
sails!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;[...]&lt;/em&gt; I also know of places where I have myself used a complicated structure
with excessively unrestrained &lt;code&gt;go to&lt;/code&gt; statements, especially the notorious
Algorithm 2.3.3A for multivariate polynomial addition. The original program had
&lt;em&gt;at least&lt;/em&gt; three bugs; exercise 2.3.3-14 &amp;quot;Give a formal proof (or disproof) of
the validity of Algorithm A&amp;quot;, was therefore unexpectedly easy.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;It is important to keep efficiency in its place, as mentioned above,
but when efficiency counts we should also know how to achieve it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;A programmer should create a program P which is readily understood and
well-documented, and then he should optimize it into a program Q which is very
efficient. Program Q may contain &lt;code&gt;go to&lt;/code&gt; statements and other low-level
features, but the transformation from P to Q should be accomplished by
completely reliable and well-documented &amp;quot;mechanical&amp;quot; operations. At this
point many readers will say, &amp;quot;But he should only write P, and an optimizing
compiler will produce Q.&amp;quot; To this I say, &amp;quot;No, the optimizing compiler would
have to be so complicated that it will in fact be unreliable&amp;quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;We found ourselves always running up against the same problem: the compiler
needs to be in a dialog with the programmer; it needs to know properties of the
data, and whether certain cases can arise, etc. And we couldn&apos;t think of a good
language in which to have such a dialog.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The programmer using such a system will write his beautifully-structure, but
possibly inefficient, program P; then he will interactively specify
transformations that make it efficient. Such a system will be much more
powerful and reliable and a completely automatic one.  [...] The original
program P should be retained along with the transformation specifications, so
that it can be properly understood and maintained as time passes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;He &lt;em&gt;[Edsger Dijkstra]&lt;/em&gt; went on to say that he looks forward to the day when
machines are so fast that we won&apos;t be under pressure to optimize our programs.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Though &lt;em&gt;[a previous code snippet]&lt;/em&gt; is slightly cleaner looking than the method
in my book, it is noticeable slower, and we have nothing to fear by using a
slightly more complicated method once it has been proved correct. Beautiful
algorithms are, unfortunately, not always the most useful.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;One thing we haven&apos;t spelled out clearly, however, is what makes some &lt;code&gt;go to&lt;/code&gt;&apos;s
bad and others acceptable. The reason is that we&apos;ve really been directing our
attention to the wrong issue, to the objective question of &lt;code&gt;go to&lt;/code&gt; elimination
instead of the important subjective question of program structure.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;We should ordinarily keep efficiency considerations in the background when we
formulate our programs. We need to be subconsciously aware of the data
processing tools available to us, but we should strive most of all for a program
that is easy to understand and almost sure to work.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-quote&quot;&gt;
&lt;p&gt;On the off-chance that you have no idea what I&apos;m talking about: don&apos;t bother looking it up! Read the paper itself instead; this will provide you with a much needed context that is usually omitted from the quote. &lt;a href=&quot;#user-content-fnref-quote&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Efficient Simulation Through Linear Algebra</title><id>https://mht.wtf/post/sparse-solves/</id><updated>2022-08-12T17:18:54+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/sparse-solves/" rel=""/><link href="https://mht.wtf/post/sparse-solves/index.html" rel="alternate"/><published>2022-08-12T17:18:54+02:00</published><content type="text/html">&lt;p&gt;I spent a lot of time working on a project in which physically plausible simulation of soft materials with pressure chambers was a key part,
and in doing so, we managed to improve a part of our simulation by a significant amount.
I was very happy with how this small part of the whole system turned out, and I&apos;ve been wanting to share it for a while.&lt;/p&gt;
&lt;p&gt;A fair warning though, we need to spend a little time setting up the context in order to see &lt;em&gt;why&lt;/em&gt; this is a thing that can happen very naturally,
as opposed to a magic algebraic trick that we can pull out of a hat.&lt;/p&gt;
&lt;p&gt;If you haven&apos;t seen physically based simulations before, don&apos;t worry too much about the details.
It helps if we can get on the same page regarding &lt;em&gt;why&lt;/em&gt; we are even here in the first place, but the details of the context really doesn&apos;t matter for the point I&apos;m trying to get across.&lt;/p&gt;
&lt;p&gt;If you &lt;em&gt;have&lt;/em&gt; seen physically based simulations before, also don&apos;t worry too much about the details.
There isn&apos;t anything fancy going on here; no second order elements, no fancy time stepping, no dynamics, basically nothing that hasn&apos;t been around for 20 years&lt;sup&gt;&lt;a href=&quot;#user-content-fn-years&quot; id=&quot;user-content-fnref-years&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.
The trick is somewhat fancy though, to me anyways.&lt;/p&gt;
&lt;h2&gt;Finite-Elements&lt;/h2&gt;
&lt;p&gt;We wanted to simulate the behavior of a soft material with a certain geometry when we inject pressurized air into it.
A simple way of doing so is by representing the geometry of the material with a &lt;a href=&quot;https://wias-berlin.de/software/index.jsp?id=TetGen&amp;amp;lang=1&quot;&gt;tetrahedral mesh&lt;/a&gt;,
and defining an energy that is a function of the deformation of those &lt;a href=&quot;https://en.wikipedia.org/wiki/Tetrahedron&quot;&gt;tetrahedra&lt;/a&gt;, or &amp;quot;tets&amp;quot;.
The nodal positions of the mesh are our &amp;quot;degrees of freedom&amp;quot;: they are what we can move around, and the energy of the system is a function of those positions.
You can imagine an energy function for each tet similar to&lt;sup&gt;&lt;a href=&quot;#user-content-fn-rs&quot; id=&quot;user-content-fnref-rs&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;fn energy(nodes: [Node; 4]) -&amp;gt; f64 { ... }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you are given &lt;em&gt;any&lt;/em&gt; nodal positions, you can compute an energy from it. For instance, if a tet was supposed to have 1 volume, but it is stretched out to have 2 volume, it
would make sense that it has a lot of energy, which it can &amp;quot;use&amp;quot;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-anthropomorphize&quot; id=&quot;user-content-fnref-anthropomorphize&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; to return to it&apos;s preferred (&amp;quot;rest&amp;quot;) position, of having 1 volume again.
The energy function&lt;sup&gt;&lt;a href=&quot;#user-content-fn-defograd&quot; id=&quot;user-content-fnref-defograd&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; defines exactly how much the tet would want to return to some other configuration when deformed.
By summing the energies for all the tets in the system, we get the total energy of the whole system.&lt;/p&gt;
&lt;p&gt;You can also have other energies that adds into the whole system.
Since we are dealing with a pneumatic system, we assign pressure forces to the faces of our mesh that are adjacent to the pressure chamber,
such that the forces are proportional to the face area and the pressure.
If we know how much gas is in the chamber (this is one our our degrees of freedom), and we know the volume of the chamber, we can compute the pressure using the &lt;a href=&quot;https://en.wikipedia.org/wiki/Ideal_gas_law&quot;&gt;ideal gas law&lt;/a&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-igl&quot; id=&quot;user-content-fnref-igl&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Finding Equilibrium&lt;/h3&gt;
&lt;p&gt;Having only this energy function, we can compute the &lt;em&gt;forces&lt;/em&gt; that act on the nodes in our system as the direction in which they would have to move to &lt;em&gt;decrease&lt;/em&gt; that energy.
In other words, we let $$f = -\frac{\partial E}{\partial x}.$$
Note the minus sign: the gradient of a function is the direction in which it &lt;em&gt;increases&lt;/em&gt; the most, and we would like it to &lt;em&gt;decrease&lt;/em&gt;.
This is also where notation gets a little messy: the $x$ above represents the positions of all the nodes, so it is really a vector in $\mathbb R^{3n}$ for a 3 dimensional system of $n$ nodes.&lt;/p&gt;
&lt;p&gt;For a single tet, we have 12 numbers, namely the $x$, $y$, and $z$ coordinate of the four vertices.
We can pretend that the energy function above reads&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;fn energy(nodes: [f64; 12]) -&amp;gt; f64 { ... }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With this, we see that $f_t$, the forces on a single tet, is also a vector of 12 numbers, which corresponds to the forces on the respective nodes in their respective coordinate, whichever way we flattened&lt;sup&gt;&lt;a href=&quot;#user-content-fn-flat&quot; id=&quot;user-content-fnref-flat&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; it in the first place&lt;sup&gt;&lt;a href=&quot;#user-content-fn-flatten&quot; id=&quot;user-content-fnref-flatten&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;.
$$f_t \in\mathbb R^{12}$$&lt;/p&gt;
&lt;p&gt;We can use this information to move the nodes in our system in order to decrease the global energy of the whole system:
loop over all tets, compute the forces from that tet to its four nodes, sum up the forces on all the nodes into one big vector $f\in \mathbb R^{3n}$, and move the vertices some amount $\eta &amp;gt; 0$ in this direction:
$$x^{(i+1)} = x^{(i)} + \eta f.$$
This is called &lt;a href=&quot;https://en.wikipedia.org/wiki/Gradient_descent&quot;&gt;gradient descent&lt;/a&gt;, and it&apos;s not so great, at least not for these kinds of systems, because it takes a long time before it finds equilibrium.
When $f=0$ we have reached equilibrium, and we&apos;re at rest.&lt;/p&gt;
&lt;h3&gt;Newton&apos;s Method&lt;/h3&gt;
&lt;p&gt;To improve &lt;a href=&quot;https://en.wikipedia.org/wiki/Convergent_series&quot;&gt;convergence&lt;/a&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-convergence&quot; id=&quot;user-content-fnref-convergence&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; we can compute yet another derivative, namely
$$
\frac{\partial^2 E}{\partial x \partial x} =
\frac{\partial f}{\partial x},\qquad
\frac{\partial f_t}{\partial x_t}\in\mathbb R^{12\times 12}$$
Now we&apos;ve got $12 \times 12 = 144$ numbers, for each tet! Similarly to what we did above, we can combine all of these smaller matrices to one giant matrix&lt;sup&gt;&lt;a href=&quot;#user-content-fn-assembly&quot; id=&quot;user-content-fnref-assembly&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;9&lt;/a&gt;&lt;/sup&gt; that we&apos;ll call the &lt;a href=&quot;https://en.wikipedia.org/wiki/Hessian_matrix&quot;&gt;Hessian&lt;/a&gt; $H\in\mathbb R^{3n \times 3n}$,
and perform &lt;a href=&quot;https://en.wikipedia.org/wiki/Newton&apos;s_method&quot;&gt;Newtons&apos;s method&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What we want to do with $H$ is find a direction $d$ such that $Hd = -f$ and then set our new node positions to be&lt;sup&gt;&lt;a href=&quot;#user-content-fn-newtonstep&quot; id=&quot;user-content-fnref-newtonstep&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;10&lt;/a&gt;&lt;/sup&gt;
$$x^{(i+1)}=x^{(i)}+\eta d$$
Don&apos;t panic if this jumps out of nowhere, because it kind of does.
Roughly speaking, what this means is that we pretend that our energy function is &lt;a href=&quot;https://en.wikipedia.org/wiki/Quadratic_function&quot;&gt;quadratic&lt;/a&gt;, because
then this update will make us go straight to the minimum point, which in our case is force equilibrium.
If the function is &lt;em&gt;not&lt;/em&gt; quadratic (and it probably isn&apos;t), then we hope that we&apos;ll get closer, and indeed, as long as we start &amp;quot;sufficiently close&amp;quot; to the minima, we will.&lt;/p&gt;
&lt;h2&gt;Linear Systems&lt;/h2&gt;
&lt;p&gt;How do we &amp;quot;solve&amp;quot; $Hd = -f$ when we know $H$ and $f$?
This is what we call a &amp;quot;linear system of equations&amp;quot;, and is a workhorse of scientific computation, geometry processing, computer graphics, and many related fields.
It is often written as the equation&lt;/p&gt;
&lt;p&gt;$$Ax = b$$&lt;/p&gt;
&lt;p&gt;or, if we choose dimensions of the variables (I chose 6 here) and write everything out explicitly:&lt;/p&gt;
&lt;p&gt;$$
\begin{pmatrix}
a_{1,1} &amp;amp; a_{1,2} &amp;amp;  a_{1,3} &amp;amp; a_{1,4} &amp;amp;  a_{1,5} &amp;amp; a_{1,6}\\
a_{2,1} &amp;amp; a_{2,2} &amp;amp;  a_{2,3} &amp;amp; a_{2,4} &amp;amp;  a_{2,5} &amp;amp; a_{2,6}\\
a_{3,1} &amp;amp; a_{3,2} &amp;amp;  a_{3,3} &amp;amp; a_{3,4} &amp;amp;  a_{3,5} &amp;amp; a_{3,6}\\
a_{4,1} &amp;amp; a_{4,2} &amp;amp;  a_{4,3} &amp;amp; a_{4,4} &amp;amp;  a_{4,5} &amp;amp; a_{4,6}\\
a_{5,1} &amp;amp; a_{5,2} &amp;amp;  a_{5,3} &amp;amp; a_{5,4} &amp;amp;  a_{5,5} &amp;amp; a_{5,6}\\
a_{6,1} &amp;amp; a_{6,2} &amp;amp;  a_{6,3} &amp;amp; a_{6,4} &amp;amp;  a_{6,5} &amp;amp; a_{6,6}
\end{pmatrix}
\begin{pmatrix} x_1\\ x_2\\ x_3\\ x_4\\ x_5\\ x_6 \end{pmatrix}
=\begin{pmatrix} b_1\\ b_2\\ b_3\\ b_4\\ b_5\\ b_6\end{pmatrix}
$$&lt;/p&gt;
&lt;p&gt;The operation we want to do is find the $x$ given $A$ and $b$.
That is, which $x$ (if any!) should I multiply $A$ with to get $b$?
Algebraically, we can simply write
$$x = A^{-1}b,$$
but this is very rarely done in practice because computing the inverse of a matrix is rather expensive&lt;sup&gt;&lt;a href=&quot;#user-content-fn-ainv&quot; id=&quot;user-content-fnref-ainv&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;11&lt;/a&gt;&lt;/sup&gt;.
People have figured out that there are ways of finding $x$ without computing $A^{-1}$ explicitly, and it is this we mean by a &lt;em&gt;linear solve&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;For instance, in Julia we can use the &lt;code&gt;\&lt;/code&gt; operator for linear solves. Observe:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-julia&quot;&gt;julia&amp;gt; A = rand(6,6) # Get a random 6x6 matrix (and hope it is full rank)
6×6 Matrix{Float64}:
 0.610793     0.0588659  0.90725   0.723158  0.480303   0.00631715
 0.10528      0.229984   0.536642  0.91345   0.650178   0.237762
 0.600606     0.24921    0.349393  0.626754  0.0971094  0.771216
 0.536192     0.0458314  0.541457  0.556307  0.132692   0.55307
 0.936709     0.215612   0.284619  0.304965  0.926599   0.719019
 0.000957923  0.852531   0.290136  0.151528  0.129307   0.0528658

julia&amp;gt; b = rand(6) # Get a random b
6-element Vector{Float64}:
 0.7716876359155332
 0.4285009788970344
 0.8110655185850537
 0.19638254649350662
 0.6621420580446692
 0.06633609289427767

julia&amp;gt; x = A \ b
6-element Vector{Float64}:
  2.77721146947569
  0.7894422416781481
 -2.7841498287174837
  2.5819747913641087
 -0.6223503841138821
 -2.1248845687489477

julia&amp;gt; A * x - b # If  Ax = b  then  Ax-b = 0
6-element Vector{Float64}:
 -1.1102230246251565e-16
  2.220446049250313e-16
  0.0
  5.551115123125783e-16
  1.1102230246251565e-16
  5.551115123125783e-17
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are many things to be said about solving linear systems, but there&apos;s only one more thing we&apos;ll need to know here: sparsity.&lt;/p&gt;
&lt;h3&gt;Solving Sparse Linear Systems&lt;/h3&gt;
&lt;p&gt;The picture below is the Hessian matrix $H$ of a one of these finite elements systems.
The pixel at position &lt;code&gt;i,j&lt;/code&gt; correspond to $H_{ij}$, and it is color coded so that blue means negative, red means positive, and gray is zero.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;0804-hessian-reorder.png&quot; alt=&quot;A Hessian texture where each pixel is color coded with the numeric value for its coordinates.&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The noteworthy thing about this picture is the amount of gray: &lt;em&gt;most&lt;/em&gt; pixels are gray.
Since the Hessian quantifies how sensitive the &lt;em&gt;forces&lt;/em&gt; on our nodes are to the &lt;em&gt;position&lt;/em&gt; of the nodes themselves, this makes sense.
Moving around a node on one side of the mesh does not change anything about the forces on the other side.
That is, unless those nodes both are on the pressure boundary: in this case the volume is changed ever so slightly, which in turn changes the pressure,
which in changes the forces on &lt;em&gt;all&lt;/em&gt; of the nodes that are on the pressure boundary.
These nodes correspond to the &lt;em&gt;block&lt;/em&gt; we are seeing in the upper left corner or the picture&lt;sup&gt;&lt;a href=&quot;#user-content-fn-sort&quot; id=&quot;user-content-fnref-sort&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Recall from &lt;a href=&quot;#linear-systems&quot;&gt;above&lt;/a&gt; that there are a bunch of methods for solving these systems, but, perhaps obviously, any one of these methods will for sure need to look at each element in the matrix.
If there are many elements in the matrix, there will be a lot of work; you can think of this as $O(n^2)$&lt;sup&gt;&lt;a href=&quot;#user-content-fn-linsolve&quot; id=&quot;user-content-fnref-linsolve&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;13&lt;/a&gt;&lt;/sup&gt; where $n$ is the number of degrees of freedom we have (the number of rows and columns in $H$).
On the other hand, if most of the elements in $H$ are zero, we can store the matrix in a &lt;a href=&quot;https://en.wikipedia.org/wiki/Sparse_matrix&quot;&gt;sparse format&lt;/a&gt;, so that any algorithm working
on $H$ does not have to iterate over a whole lot of zeroes.
It will still need to look at each non-zero number, but if we only have a constant number $c$ of entries in each row (or column), we only have a total of $O(cn)$ entries in total.&lt;/p&gt;
&lt;p&gt;The problem, of course, is that the matrix in the picture above isn&apos;t really sparse, since it has this giant block of roughly $\frac{1}{4}n^2$ numbers in it.&lt;/p&gt;
&lt;p&gt;... or is it?&lt;/p&gt;
&lt;h2&gt;Property vs. Representation&lt;/h2&gt;
&lt;p&gt;This brings us to the key of post.
It certainly looks like the matrix is dense, and in general, there is no way of making a dense matrix sparse, since there is simply more information in a dense matrix.
But maybe there is a lot of duplicate information in our matrix?
To show what I mean, consider the matrix
$$A = uv^\top\qquad\text{or equivalently }\qquad A_{i,j} = u_iv_j$$
or for some concrete numbers, consider this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-julia&quot;&gt;julia&amp;gt; u, v = rand(6), rand(6);

julia&amp;gt; u
6-element Vector{Float64}:
 0.17648645508411875
 0.9501460722894218
 0.7570256767954698
 0.9097476055645976
 0.7514042466862265
 0.2594892833200104

julia&amp;gt; v
6-element Vector{Float64}:
 0.9880351017724492
 0.7271356154478763
 0.29724548913210114
 0.7470357266014565
 0.8131233770317735
 0.26312703421677464

julia&amp;gt; u * v&apos; # v&apos; is Julia&apos;s way of transposing
6×6 Matrix{Float64}:
 0.174375  0.12833   0.0524598  0.131842  0.143505  0.0464384
 0.938778  0.690885  0.282427   0.709793  0.772586  0.250009
 0.747968  0.55046   0.225022   0.565525  0.615555  0.199194
 0.898863  0.66151   0.270418   0.679614  0.739737  0.239379
 0.742414  0.546373  0.223352   0.561326  0.610984  0.197715
 0.256385  0.188684  0.077132   0.193848  0.210997  0.0682786
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The matrix is a &amp;quot;full&amp;quot; matrix of 36 numbers, but they all come from only 12 numbers&lt;sup&gt;&lt;a href=&quot;#user-content-fn-rank&quot; id=&quot;user-content-fnref-rank&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;.
In a sense, the matrix &lt;em&gt;should&lt;/em&gt; be sparse, because it&apos;s only 12 numbers, but its &lt;em&gt;representation&lt;/em&gt; is not sparse.
If we can rewrite our $H$ above into a form that looks like this, maybe there&apos;s hope for speeding up the solves.&lt;/p&gt;
&lt;p&gt;The way we compute pressure forces on the faces of the tets is first to compute the volume of the air chamber,
compute the pressure using the ideal gas law, and apply the pressure on each face so that the force is proportional to both the pressure and the face area, and in the direction of the inward normal of the face.
Roughly, following the notation I&apos;ve used already, it looks like this&lt;sup&gt;&lt;a href=&quot;#user-content-fn-notation&quot; id=&quot;user-content-fnref-notation&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;15&lt;/a&gt;&lt;/sup&gt;:
$$f_p = p(x) n(x)$$
where both the pressure $p$ and the area scaled normal vector $n$ is a function of the node positions $x$.
When we compute the Hessian entries $H_p$ for only the pressure forces, we use the product rule to get
$$H_p=\frac{\partial f_p}{\partial x} = \frac{\partial p}{\partial x}(x)n(x) + p(x)\frac{\partial n}{\partial x}(x).$$
Writing it all out like this is useful since we can pinpoint exactly where in the formulas the density problem comes from.
The term $\partial p /\partial x$ is dense, since it depends on the volume of the air chamber, and all nodes along the boundary of this chamber influences the volume if they move&lt;sup&gt;&lt;a href=&quot;#user-content-fn-wall&quot; id=&quot;user-content-fnref-wall&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;16&lt;/a&gt;&lt;/sup&gt; .
But do note here that, similarly to the toy example above, we really only have $3n$ numbers in ${\partial p}/{\partial x}$, since
for given $x$, $p(x)$ is only a single number --- the pressure --- so $\frac{\partial p}{\partial x} \in \mathbb R^{3n}$ (because $x$ is $3n$ numbers).
Somehow this is expanded to $O(n^2)$ numbers in the process of assembly.&lt;/p&gt;
&lt;p&gt;In fact, if we write $u=\partial p/\partial x$ and $v=n(x)$ then the first summand is just $uv^\top$.
We use this to rewrite the computation of $H$ by first doing the pressure computation separately, and then the rest of $H$:
$$H = H_p + H_r$$
($r$ for rest) and then write the pressure terms as
$$H_p = uv^\top + p(x)\frac{\partial n}{\partial x}(x)$$
and at last, we write the whole Hessian in a slightly more readable form as
$$H = H_s + uv^\top,\qquad H_s = H_r + p(x)\frac{\partial n}{\partial x}(x)$$
This system is still as dense as before if we multiply out $uv^\top$ and add it all together, but we&apos;re not going to do that.&lt;/p&gt;
&lt;h2&gt;Solving The New System&lt;/h2&gt;
&lt;p&gt;Before we had the system $Hd = -f$ which we wanted to solve for $d$. Now our new system is the slightly less nice
$$(H_s + uv^\top) d = -f$$
and it doesn&apos;t seem like we&apos;ve made much progress.&lt;/p&gt;
&lt;p&gt;What helps us is the &lt;a href=&quot;https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula&quot;&gt;Sherman-Morrison formula&lt;/a&gt;,
which tells us how to invert a matrix of type $A + uv^\top$;
see &lt;a href=&quot;https://kristianeschenburg.github.io/2018/05/rank-one-updates&quot;&gt;this&lt;/a&gt; and &lt;a href=&quot;https://timvieira.github.io/blog/post/2021/03/25/fast-rank-one-updates-to-matrix-inverse/&quot;&gt;this&lt;/a&gt; post on solving these systems.
The closed form solution includes inverting $A$ itself ($H_s$ in our case), but we can avoid computing this explicitly because we are not looking for the inverse of the matrix we have, we just want to solve the linear system.&lt;/p&gt;
&lt;p&gt;For matrices that are easy to invert, the formula &lt;em&gt;is&lt;/em&gt; useful for us; in particular, we choose $A=I$, and write out the inverse explicitly:
$${\left(I + uv^T\right)}^{-1} = I - \frac{uv^T}{1 + u^Tv}.$$
Again, this does not help us directly yet, because in our case we have $H_s$ as the matrix inside the parenthesis, and not $I$.
We will need to somehow massage it out.&lt;/p&gt;
&lt;p&gt;The first step is to take our system
$$(H_s + uv^\top)d = -f$$
and algebraically multiply in $H_s^{-1}$ from the left so that we get
$$(I + H_s^{-1}uv^\top)d = -H_s^{-1}f.$$
Let&apos;s call $H_s^{-1}u=w$, or in other words, $H_sw = u$. Since $H_s$ is sparse we can easily solve for $w$, and insert this back into the equation:
$$(I + wv^\top)d = -H_s^{-1}f.$$
Now we introduce a new variable, just to make this step easier: let $c = (I + wv^\top)d$. We haven&apos;t found $c$ yet, and we still don&apos;t know $d$, this too is just algebra.
We are left with
$$c = -H_s^{-1}f$$
or
$$H_s c = -f$$
in which only $c$ is unknown. $H_s$ is still sparse, so we can solve for $c$.
At last, we look at the definition of $c$ that we came up with. We have all quantities&lt;sup&gt;&lt;a href=&quot;#user-content-fn-uv&quot; id=&quot;user-content-fnref-uv&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;17&lt;/a&gt;&lt;/sup&gt; except for $d$:
$$(I + wv^\top) d = c$$
and we already have a analytical inverse for this matrix, thanks to Sherman-Morrison.
By inserting the inverse on the right and multiplying out (notice that we don&apos;t even have to construct the matrix that is the SM inverse!) we get:
$$\begin{align}
d &amp;amp;= {\left(I + wv^\top \right)}^{-1} c \\
&amp;amp;= (I - \frac{wv^\top }{1 + w^\top v}) c\\
&amp;amp;= c - \frac{w(v^\top c)}{1 + w^\top v}
\end{align}$$
which is just two dot product, a scalar-vector multiply, and a vector-vector subtraction.&lt;/p&gt;
&lt;p&gt;That&apos;s quite a mouthful, but in the end we have only solved two sparse linear system with the &lt;em&gt;same&lt;/em&gt; matrix $H_s$, and done a few dot products at the end.
We avoided the dense solve, and in fact, we avoided even &lt;em&gt;constructing&lt;/em&gt; a new matrix.&lt;/p&gt;
&lt;p&gt;The fact that we used the same matrix on both of the linear solves is also really important: linear solvers usually factorize the matrix in some way or another before they solve the system,
for instance into an &lt;a href=&quot;https://en.wikipedia.org/wiki/LU_decomposition&quot;&gt;LU&lt;/a&gt;, &lt;a href=&quot;https://en.wikipedia.org/wiki/Cholesky_decomposition#LDL_decomposition&quot;&gt;LDLT&lt;/a&gt;, or &lt;a href=&quot;https://en.wikipedia.org/wiki/QR_decomposition&quot;&gt;QR&lt;/a&gt; factorization.
When we have the factorization we can very easily solve the system, and so by solving multiple linear systems
with the same matrix (and different $b$s) we only need to factorize once, so the second solve is really fast.&lt;/p&gt;
&lt;h2&gt;Quick Micro benchmark&lt;/h2&gt;
&lt;p&gt;What does this really give us?
Instead of making a proper comparison from the simulation code base, I decided to hack together a small Julia program to illustrate.
Here is the measured data of solving what basically amounts to the linear system above.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;right&quot;&gt;$n$&lt;/th&gt;
&lt;th&gt;slow&lt;/th&gt;
&lt;th&gt;fast&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;right&quot;&gt;500&lt;/td&gt;
&lt;td&gt;0.01742&lt;/td&gt;
&lt;td&gt;0.01053&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;1.65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;right&quot;&gt;1000&lt;/td&gt;
&lt;td&gt;0.03732&lt;/td&gt;
&lt;td&gt;0.01933&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;1.93&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;right&quot;&gt;2000&lt;/td&gt;
&lt;td&gt;0.12346&lt;/td&gt;
&lt;td&gt;0.06351&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;1.94&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;right&quot;&gt;5000&lt;/td&gt;
&lt;td&gt;0.96995&lt;/td&gt;
&lt;td&gt;0.45724&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;2.12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;right&quot;&gt;10000&lt;/td&gt;
&lt;td&gt;6.12298&lt;/td&gt;
&lt;td&gt;2.22139&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;2.75&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;right&quot;&gt;30000&lt;/td&gt;
&lt;td&gt;131.273&lt;/td&gt;
&lt;td&gt;39.2934&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;3.34&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The data is generated from the following Julia code:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-julia&quot;&gt;using LinearAlgebra
using SparseArrays

mod1p(n, m) =  ((n - 1) % m) + 1

# Compute a random sparse matrix in which each column has at most `k` entries
function randomsparse(n, k)
    A = zeros(n, n)
    for i=1:n
        ixs = rand(UInt32, k) .|&amp;gt; a-&amp;gt;mod1p(a, n)
        nums = rand(k)
        A[ixs,i] = nums
    end
    sparse(A + I) # ensure we get a full rank
end
    
function doit(n)
    A = randomsparse(n, 5)
    u = rand(n)
    v = rand(n)
    b = rand(n)

    @time(begin # slow path
        slow = A + u * v&apos;
        factor = factorize(slow)
        x = factor \ b
    end);

    @time(begin # fast path
        factor = factorize(A)
        w = factor \ u
        c = factor \ b
        x = c - w * dot(v, c) / (1 + dot(w, v))
    end);
end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The code for the fast path &lt;em&gt;is&lt;/em&gt; a little more complicated than the straight-forward slow path, but overall, not by a lot.
And the speedup we&apos;re getting is well worth it.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;One of the reasons for why I really like this solution is that it&apos;s such a good example of good things happening because we looked closely at our problem.
We already knew that linear solves would be the majority of the time spent in our pipeline.
We also knew that sparse solves are quicker than dense solves.
We &lt;em&gt;also&lt;/em&gt; knew that our system felt dense due to the dependence of all the nodes along the air chamber boundary.
&lt;em&gt;Despite&lt;/em&gt; all of this, we managed to massage the problem we had from one dense solve into two sparse solves, and we got a significant speedup out of it.&lt;/p&gt;
&lt;p&gt;This wouldn&apos;t have happened if we were content with the fact that &amp;quot;Linear solves takes up the majority of time in Newton&apos;s algorithm&amp;quot; (which is true; the linear solve &lt;em&gt;is&lt;/em&gt; the bottleneck).&lt;/p&gt;
&lt;p&gt;This wouldn&apos;t have happened it we looked at the Hessian and concluded that &amp;quot;The system is dense, therefore the solve will be slow&amp;quot; (which is true; dense systems &lt;em&gt;are&lt;/em&gt; slower to solve).&lt;/p&gt;
&lt;p&gt;Sometimes there &lt;em&gt;are&lt;/em&gt; better solutions, but they require that we look closely at the problem at hand. Without looking closely in the first place, we wouldn&apos;t even have known that better solutions could exist.&lt;/p&gt;
&lt;p&gt;Even though this example was full of math I really think the general sentiment translates well into programming, or completely different
aspects of life.
It is really hard to tell the difference between how something appears and how it really is&lt;sup&gt;&lt;a href=&quot;#user-content-fn-geom&quot; id=&quot;user-content-fnref-geom&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;18&lt;/a&gt;&lt;/sup&gt;.
I can&apos;t illustrate this with an example from &lt;em&gt;your&lt;/em&gt; life, but I hope that having made the distinction here, you might come up with one.&lt;/p&gt;
&lt;p&gt;Comments, questions, pointers, and prefactorized matrices, can be sent to my &lt;a href=&quot;mailto:https://lists.sr.ht/~mht/public-inbox&quot;&gt;public inbox&lt;/a&gt; (plain text email only).&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-years&quot;&gt;
&lt;p&gt;I think, at least! &lt;a href=&quot;#user-content-fnref-years&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-rs&quot;&gt;
&lt;p&gt;If you&apos;re wondering why I&apos;m using Rust syntax here, when the only real code in this post is Julia code, then you&apos;ll have an unanswered question. &lt;a href=&quot;#user-content-fnref-rs&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-anthropomorphize&quot;&gt;
&lt;p&gt;If you&apos;re accusing me of anthropomorphizing here, I&apos;m guilty as charged. &lt;a href=&quot;#user-content-fnref-anthropomorphize&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-defograd&quot;&gt;
&lt;p&gt;Often, this energy is a function of the &lt;em&gt;deformation gradient&lt;/em&gt; $F$, and not the nodal positions directly. $F$ is the matrix that transforms the shape of the tetrahedron from its initial shape to its deformed shape&lt;sup&gt;&lt;a href=&quot;#user-content-fn-flinear&quot; id=&quot;user-content-fnref-flinear&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;19&lt;/a&gt;&lt;/sup&gt;. If nothing has happened, $F$ is the identity matrix, if the tet is only rotated, $F$ would be a rotation matrix, and so on. &lt;a href=&quot;#user-content-fnref-defograd&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-igl&quot;&gt;
&lt;p&gt;We&apos;re assuming here that the temperature is constant. &lt;a href=&quot;#user-content-fnref-igl&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-flat&quot;&gt;
&lt;p&gt;&amp;quot;Flattening&amp;quot; is a fairly common practice when we don&apos;t want to deal with tensors in our derivatives; if we have a matrix valued function differentiated with respect to a matrix, we get a 4th order tensor, which is &lt;em&gt;different&lt;/em&gt; to deal with algebraically than what we might be used to. A kind of hack to avoid this is to let the positions of all the nodes not be a matrix of size $\mathbb R^{3\times n}$ but a vector of size $\mathbb R^{3n}$ instead. As long as we&apos;re willing to put up with the change of indices from the flattened to un-flattened configurations, we&apos;re fine. &lt;a href=&quot;#user-content-fnref-flat&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-flatten&quot;&gt;
&lt;p&gt;We basically have two options: &lt;code&gt;xyzxyzxyz...&lt;/code&gt; or &lt;code&gt;xxx...yyy...zzz...&lt;/code&gt;. &lt;a href=&quot;#user-content-fnref-flatten&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-convergence&quot;&gt;
&lt;p&gt;Roughly, how fast we go from a configuration to our goal; in this case a rest configuration. If we keep getting closer and closer, but the amount by which we&apos;re getting closer and closer also shrinks proportionally we have &amp;quot;linear&amp;quot; convergence, which is not great. &lt;a href=&quot;#user-content-fnref-convergence&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-assembly&quot;&gt;
&lt;p&gt;This operation is often called &amp;quot;assembly&amp;quot;. &lt;a href=&quot;#user-content-fnref-assembly&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-newtonstep&quot;&gt;
&lt;p&gt;The step size $\eta$ in Newton&apos;s method is kinda optional, in the sense that it should converge to $1$, but intermediate steps might not be $1$, for instance if taking a full step will cause some elements to invert. Some energies are not defined for inverted elements, and for those cases one has to be careful about never taking too long steps. &lt;a href=&quot;#user-content-fnref-newtonstep&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-ainv&quot;&gt;
&lt;p&gt;That is, unless the matrix is small, like a 2x2 or 3x3 matrix. In these cases, computing its inverse is both completely feasible, and often also the preferred way. &lt;a href=&quot;#user-content-fnref-ainv&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-sort&quot;&gt;
&lt;p&gt;In this picture i have moved the nodes at the pressure boundary to have low index, which is why they are all in the top left. Initially I had not ordered any of the nodes, which spread the block out around in the whole matrix. &lt;a href=&quot;#user-content-fnref-sort&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-linsolve&quot;&gt;
&lt;p&gt;To be clear, I&apos;m not claiming that linear solves are quadratic in the number of columns/rows in the matrix. But it is clearly a lower bound for dense matrices. &lt;a href=&quot;#user-content-fnref-linsolve&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-rank&quot;&gt;
&lt;p&gt;The technical term here is that $vv^\top$ is a &amp;quot;rank-1 matrix&amp;quot;. &lt;a href=&quot;#user-content-fnref-rank&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-notation&quot;&gt;
&lt;p&gt;Again, I&apos;m abusing notation ever so slightly here; it is easier to follow exactly if we treat each coordinate of each node separately, but then we often need to reason about index sets of coordinates for the same nodes, or the nodes which share a triangle or a tet. When implementing this stuff, this is something that has to be done at some point, but for this post I hope isn&apos;t not too bad to follow while being a little sloppy. &lt;a href=&quot;#user-content-fnref-notation&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-wall&quot;&gt;
&lt;p&gt;Unless they move exactly along the walls. &lt;a href=&quot;#user-content-fnref-wall&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-uv&quot;&gt;
&lt;p&gt;Now it&apos;s important to be extra careful; in the initial SM formula we had $uv^\top$ in the parentheses, but we have $wv^\top$. &lt;a href=&quot;#user-content-fnref-uv&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-geom&quot;&gt;
&lt;p&gt;Some fields, like differential geometry, use this taxonomy all the time. In differential geometry we can talk about &lt;em&gt;intrinsic&lt;/em&gt; properties vs. &lt;em&gt;extrinsic&lt;/em&gt; properties. If a property is intrinsic to a manifold, it doesn&apos;t matter how this manifold is &lt;em&gt;embedded&lt;/em&gt; in some space, because the property is the same. On the other hand, an extrinsic property &lt;em&gt;does&lt;/em&gt; depend on this. Examples of intrinsic and extrinsic properties include the &lt;a href=&quot;https://en.wikipedia.org/wiki/Gaussian_curvature&quot;&gt;Gaussian curvature&lt;/a&gt; (which is intrinsic) and the &lt;a href=&quot;https://en.wikipedia.org/wiki/Mean_curvature&quot;&gt;Mean curvature&lt;/a&gt; (which is extrinsic). &lt;a href=&quot;#user-content-fnref-geom&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-flinear&quot;&gt;
&lt;p&gt;Since it is a matrix this transformation is linear. Another way of looking at this is that we only really have three directions that we are transforming, namely the vectors out from one of the corners. Since we don&apos;t care about &lt;em&gt;translation&lt;/em&gt; in space, we can assume that this corner starts and ends at the origin. What&apos;s left is just to move the three vectors that come out from the fixed corner, and since we are operating in $\mathbb R^3$ there is exactly one linear transformation that moves the vectors from the initial to the deformed directions. &lt;a href=&quot;#user-content-fnref-flinear&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Quitting Socials</title><id>https://mht.wtf/post/quit/</id><updated>2025-09-23T13:39:00+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/quit/" rel=""/><link href="https://mht.wtf/post/quit/index.html" rel="alternate"/><published>2025-09-23T13:39:00+02:00</published><content type="text/html">&lt;p&gt;I think I&apos;ve had it with the big social internet. At least for a while. No more
Mastodon, HN, Lobste.rs, podcasts, Shorts, or other places that inevitabely
devolve into screaming matches about topics which I ultimately am not
interested in. I have decided that these places are making my life worse,
because the downsides of visiting them outweigh the upsides of doing so.&lt;/p&gt;
&lt;p&gt;I&apos;ll continue to happily use &lt;a href=&quot;/post/rss/&quot;&gt;rss&lt;/a&gt; to follow authors whose opinions
and writings interest me, and hope to use my newly found free-time to find more
feeds to follow and to write more on my own turf.  Hooray!&lt;/p&gt;
</content></entry><entry><title>Advent of Code 2025, Day 10, Part 2</title><id>https://mht.wtf/post/aoc25-10/</id><updated>2025-12-23T00:37:00+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/aoc25-10/" rel=""/><link href="https://mht.wtf/post/aoc25-10/index.html" rel="alternate"/><published>2025-12-23T00:37:00+02:00</published><content type="text/html">&lt;p&gt;My personal approach to Advent of Code is to do things from scratch.
Using a regex library might be okay to use but a sledgehammer like &lt;a href=&quot;https://github.com/z3prover/z3&quot;&gt;z3&lt;/a&gt; is not.
Any manual steps is certainly forbidden.
I&apos;d rather look up hints for a problem online than solving it in the &amp;quot;wrong&amp;quot; way.&lt;/p&gt;
&lt;p&gt;This years day 10 part 2 was a problem, because I couldn&apos;t find a solution that wasn&apos;t horribly slow.
It seems I was not alone in this, because the Advent of Code subreddit was filled with people having used z3,
written an ILP solver, or implemented matrix reduction followed by a brute force solve.
This discouraged me from trying to solve it at all, but after some browsing
I found &lt;a href=&quot;https://www.reddit.com/r/adventofcode/comments/1pk87hl/comment/ntp4njq/&quot;&gt;an interesting solution&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Their description was confusing.
It&apos;s the type of argument that kind-of sounds like it&apos;s correct, but at the same time not quite making sense.
I sat down trying to prove it, but couldn&apos;t do so.
Instead I found another solution which turned out to be exactly the same solution, but looking at the problem from a different view.&lt;/p&gt;
&lt;p&gt;Here it is.&lt;/p&gt;
&lt;h2&gt;Problem&lt;/h2&gt;
&lt;p&gt;I&apos;ll quickly recap the problem. We&apos;re given lines of the form:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[.##.] (3) (1,3) (2) (2,3) (0,2) (0,1) {3,5,4,7}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can ignore the &lt;code&gt;[.##.]&lt;/code&gt; part, those were for part 1.
The &lt;code&gt;(3) (1,3) ...&lt;/code&gt; part is a list of &lt;em&gt;buttons&lt;/em&gt;, where each button contains indices &lt;code&gt;0..n-1&lt;/code&gt; into
the last part &lt;code&gt;{3,5,4,7}&lt;/code&gt;, which is the &lt;em&gt;target&lt;/em&gt;.
We start at &lt;code&gt;{0,0,0,0}&lt;/code&gt;, and pressing a button increments the indices of that button by 1;
pressing the &lt;code&gt;(3)&lt;/code&gt; button would take us to &lt;code&gt;{0,0,0,1}&lt;/code&gt;, since the &lt;code&gt;3&lt;/code&gt;rd index is incremented.
The goal is to find the smallest number of button presses to reach the target &lt;code&gt;{3,5,4,7}&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Let&apos;s name and render the buttons in a few different forms, for convenience:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;name   input    vector       graphic
------------------------------------
  a    (3)      {0,0,0,1}    . . . o 
  b    (1,3)    {0,1,0,1}    . o . o 
  c    (2)      {0,0,1,0}    . . o . 
  d    (2,3)    {0,0,1,1}    . . o o 
  e    (0,2)    {1,0,1,0}    o . o . 
  f    (0,1)    {1,1,0,0}    o o . . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I&apos;ll write $abc$ for pressing $a$ and $b$ and $c$, and $a^2$ for pressing $a$ twice.
Note that ordering doesn&apos;t matter, so $ab$ is the same as $ba$.
This means that when looking for solutions we&apos;re really looking for the number of button presses for each button.&lt;/p&gt;
&lt;h2&gt;Brute Force&lt;/h2&gt;
&lt;p&gt;Straight-forward brute force is too slow.
The larger instances in the real input have ~13 buttons and target coefficients over 250.
We cannot attempt to click each of the 13 buttons &lt;code&gt;[0,250]&lt;/code&gt; times.
We could try to do smarter things, like observing that in the example input only two buttons have &lt;code&gt;0&lt;/code&gt; in them ($e$ and $f$)
and so their sum (the number of times we press $e$ plus the number of times we press $f$) must be &lt;code&gt;3&lt;/code&gt;,
but this leads down the linear algebra path, which I already had decided was not for me.&lt;/p&gt;
&lt;h2&gt;A &lt;em&gt;Bit&lt;/em&gt; of Fun&lt;/h2&gt;
&lt;p&gt;Instead, let&apos;s write out the target in binary.
Our goal will be to efficiently enumerate all possible combinations of buttons that sum to the target,
and then select the shortest one.
That is, we don&apos;t really care about the length of the solutions quite yet, we only want to list out all button sequences.&lt;/p&gt;
&lt;p&gt;Here&apos;s the target in binary:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;3 = 0011
5 = 0101
4 = 0100
7 = 0111
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let&apos;s focus on the least significant bit of the &lt;code&gt;3&lt;/code&gt;, the bit in the top right corner.
Since only buttons $e$ and $f$ increment this number,
all solutions must press $e$ an odd number of times or $f$ an odd number of times, and not both.
This is the same as saying that the total number of $e$s and $f$s must be odd because &lt;code&gt;3&lt;/code&gt; is odd,
and the sum of two numbers is odd iff exactly one of the terms is odd.&lt;/p&gt;
&lt;p&gt;This is great, because we&apos;ve only looked at one bit of the target, and we&apos;ve already
split the total search space in &lt;strong&gt;half&lt;/strong&gt;:
for any pair of $e$ and $f$ we can discard exactly half of them, namely the ones
that have the same parity (even/odd-ness).
Splitting the search space in half by a &amp;quot;local&amp;quot; observation such as this shows great promise,
because if we can do this repeatedly the space will shrink really fast.&lt;/p&gt;
&lt;h3&gt;The Column&lt;/h3&gt;
&lt;p&gt;Now, we were only looking at the first bit in the first column, but this observation holds for the entire column.
Instead of only looking at $e$ and $f$ we look at all buttons at once and try to figure out
which of them we need to press an odd number of times.
The rightmost column of the bit pattern of the target tells us this because only an odd number of presses of a button
will be able to affect those bits.&lt;/p&gt;
&lt;p&gt;The bits are &lt;code&gt;1101&lt;/code&gt;; if we can press a button at most once,
which subsets of buttons give us the pattern &lt;code&gt;1101&lt;/code&gt;?
Turns out, it&apos;s not too many!
Each row in this table lists a subset of the buttons, the increments when they&apos;re all pressed once, as well as the parity of that increment.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;                  buttons  vector      parity
3 = 001[1]        -------------------------
5 = 010[1]        af       {1,1,0,1}   1101
4 = 010[0]        bce      {1,1,2,1}   1101
7 = 011[1]        cdf      {1,1,2,1}   1101
                  abde     {1,1,2,3}   1101
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We are looking at many bits at once here, but we are doing exactly the same as before:
instead of saying that only buttons $e$ and $f$ will give us the upper-right bit &lt;code&gt;1&lt;/code&gt;,
we generalize and say that these four subsets of buttons are the only subsets that
give us the entire column &lt;code&gt;1101&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This gives us four alternatives for buttons to press an odd number of times,
and so we also know that for each alternative, the other buttons are pressed an even number of times:
if $af$ is pressed an odd number of times, then they&apos;re at least pressed once,
and $b$, $c$, $d$, and $e$ are pressed an even number of times (considering 0 as even).&lt;/p&gt;
&lt;h3&gt;The Table&lt;/h3&gt;
&lt;p&gt;Now comes the trick.
&amp;quot;Odd&amp;quot; implies &amp;quot;at least once&amp;quot; so we can subtract $af$ &lt;code&gt;{1,1,0,1}&lt;/code&gt; from the target &lt;code&gt;{3,5,4,7}&lt;/code&gt; and get a lower target &lt;code&gt;{2,4,4,6}&lt;/code&gt;.
Now we also know that &lt;em&gt;all&lt;/em&gt; buttons must be pressed an &lt;em&gt;even&lt;/em&gt; number of times.
The odd buttons $af$ were pressed once, so the remaining presses for $af$ must be even (still counting 0 as even).
We now have a smaller instance of the problem and we can recurse.&lt;/p&gt;
&lt;p&gt;This might not seem like much because we only reduced the target by &lt;code&gt;{1,1,0,1}&lt;/code&gt;, but the real
improvement is that we now know each button is pressed an even number of times.
This means instead of matching the lowest bits of the target
we can consider the second-lowest bits and ask what needs to be there.
Here are the old and new targets written out in binary:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;level 0     level 1
--------    --------
3 = 0011    2 = 00[1]0
5 = 0101    4 = 01[0]0
4 = 0100    4 = 01[0]0
7 = 0111    6 = 01[1]0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which sets of buttons will produce the pattern at the &lt;em&gt;second&lt;/em&gt; column &lt;code&gt;1001&lt;/code&gt;?&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;buttons  vector      lsb
-------------------------
bf       {1,2,0,1}   1001
de       {1,0,2,1}   1001
ace      {1,0,2,1}   1001
abcdf    {1,2,2,3}   1001
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We&apos;re not pressing each button once, but twice.
However, this only moves the bit pattern one position to the left:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;bf =  {1,2,0,1}    bbff =  {2,4,0,2}
                                    
      1 = 0001             2 = 0010
      2 = 0010             4 = 0100
      0 = 0000             0 = 0000
      1 = 0001             2 = 0010
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;so subtracting $b^2f^2$ from the target clears out this column too, and
&lt;code&gt;{2,4,4,6}&lt;/code&gt; becomes &lt;code&gt;{0,0,4,4}&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;level 0     level 1     level 2
--------    --------    --------
3 = 0011    2 = 0010    0 = 0[0]00
5 = 0101    4 = 0100    0 = 0[0]00
4 = 0100    4 = 0100    4 = 0[1]00
7 = 0111    6 = 0110    4 = 0[1]00
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The pattern in the third column is &lt;code&gt;0011&lt;/code&gt;, and the subsets that give us this pattern are&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;buttons  vector      lsb
-------------------------
d        {0,0,1,1}   0011
ac       {0,0,1,1}   0011
bef      {2,2,1,1}   0011
abcdef   {2,2,3,3}   0011
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Just like before, pressing each button in the set four times moves it to the third column,
but otherwise keeps it unchanged.
Looking at alternative $d$, we can press it four times.
This makes &lt;code&gt;{0,0,4,4}&lt;/code&gt; which reduces the target to &lt;code&gt;{0,0,0,0}&lt;/code&gt;. We&apos;re done!&lt;/p&gt;
&lt;p&gt;Coming back up from the recursion,
we pressed $d$ four times,
$bf$ two times,
and $af$ once, which makes this solution $d^4(bf)^2af = ab^2d^4f^3$.
We can double check that this is correct:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(a)     {0,0,0,1} * 1    
(b)   + {0,1,0,1} * 2    
(d)   + {0,0,1,1} * 4    
(f)   + {1,1,0,0} * 3    
      = {3,5,4,7}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Okay!&lt;/p&gt;
&lt;h3&gt;Recap&lt;/h3&gt;
&lt;p&gt;We&apos;re building up a solution for our target by looking at how many times we&apos;re pressing each button.&lt;/p&gt;
&lt;p&gt;On level one we looked at the lowest bit of the target number, which
constrained which buttons we could press an odd number of times.
Pressing a button an even number of times wouldn&apos;t affect the parity of the number,
since all indices would be incremented by a multiple of 2.
For each alternative we recurse, knowing that from now on all buttons are pressed an even number of times.&lt;/p&gt;
&lt;p&gt;On level two we do exactly the same, but we&apos;re looking at the second-lowest column of bits.
This doesn&apos;t affect the &amp;quot;which subset gives us the pattern&amp;quot; logic,
because pressing buttons twice shifts the bit-pattern by one position to the left, but doesn&apos;t change it otherwise.
Again we got four alternatives, and for each alterantive we subtract and recurse.
Now button presses must come in groups of &lt;strong&gt;four&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Now, this was only one path through the search tree, so we don&apos;t know if this is a shortest path,
but it is a &lt;em&gt;valid&lt;/em&gt; path.
Not all choices lead to valid paths:
at the last level we could have chosen $bef$ instead of $d$,
but this would have reduced our target from &lt;code&gt;{0,0,4,4}&lt;/code&gt; to &lt;code&gt;{-8,-8,0,0}&lt;/code&gt; which is clearly not good.
It also doesn&apos;t have to occur at the last level.
The recursion can take us into a state that we cannot make progress from
if not every bit-pattern is possible to make from some subset of the buttons,
and if there is a way to require this bit-pattern from the target.
In our example, all bit-patterns are possible.&lt;/p&gt;
&lt;p&gt;Here&apos;s some pseudo-rust that implements the search process:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;fn solve_binary(target: &amp;amp;Vector, level: u32) -&amp;gt; Option&amp;lt;Moves&amp;gt; {
    if target.has_negatives() {
        return None; // went too far
    }
    if target.is_zero() { // if we&apos;re at zero we&apos;re done.
        return Some(Moves::empty());
    }
    let mut solutions = Vec::new();
    let mask = mask_at_level(target, level); // required pattern
    for button_set in button_subsets_with_mask(mask) { // consider each subset
        let next = target.subtract_at_level(button_set, level);
        if let Some(mut moves) = solve_binary(&amp;amp;next, level + 1) {
            moves.add_at_level(button_set, level); // add the pressed buttons
            solutions.push(moves); // save candidate
        }
    }
    solutions.sort_by_key(|moves| moves.len());
    solutions.first().cloned() // shortest subsolution is best (if any).
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;level&lt;/code&gt; is passed around since we need to know how many times to press each button,
namely &lt;code&gt;2.pow(level)&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Alternative: Building Bits&lt;/h2&gt;
&lt;p&gt;Another way of looking at this process is that we&apos;re looking at the bits in the target
while trying to figure out the bits in the number of button presses.
On the first level we&apos;re figuring out which buttons we press an odd number of times,
which corresponds to the &lt;code&gt;?&lt;/code&gt; bits in the &amp;quot;button table&amp;quot; on the left.
This is constrained by the corresponding column (the rightmost) in the &amp;quot;target table&amp;quot; in the middle.
The pattern there, &lt;code&gt;1101&lt;/code&gt;, gives us alternatives for the pattern we can put on the left.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;a = ...[?]                
b = ...[?]      3 = 001[1]              100001 (af)     {1,1,0,1}
c = ...[?]      5 = 010[1]     1101 =&amp;gt;  011010 (bce)    {1,1,2,1}
d = ...[?]      4 = 010[0]              001101 (cdf)    {1,1,2,1}
e = ...[?]      7 = 011[1]              110110 (abde)   {1,1,2,3}
f = ...[?]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Each of those patterns again give us a vector to subtract from the target with the property
that it zeroes out the current column, and maybe touch the upper bits, but leave the bottom
bits zero. $af$ (&lt;code&gt;100001&lt;/code&gt;), is one such pattern, and its corresponding vector is &lt;code&gt;{1,1,0,1}&lt;/code&gt;,
so on this branch the next level looks liks this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;a = ..[?]1                 
b = ..[?]0       2 = 00[1]0             010001 (bf)     {1,2,0,1}
c = ..[?]0       4 = 01[0]0    1001 =&amp;gt;  000110 (de)     {1,0,2,1}
d = ..[?]0       4 = 01[0]0             101010 (ace)    {1,0,2,1}
e = ..[?]0       6 = 01[1]0             111101 (abcdf)  {1,2,2,3}
f = ..[?]1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This time we choose $bf$ (&lt;code&gt;010001&lt;/code&gt;), bringing the target down to &lt;code&gt;{0,0,4,4}&lt;/code&gt;.
Note that this also cleared the bit in the third column of the first &lt;code&gt;4&lt;/code&gt;; this is fine.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;a = .[?]01                 
b = .[?]10       0 = 0[0]00             000100 (d)      {0,0,1,1}
c = .[?]00       0 = 0[0]00    0011 =&amp;gt;  101000 (ac)     {0,0,1,1}
d = .[?]00       4 = 0[1]00             010011 (bef)    {2,2,1,1}
e = .[?]00       4 = 0[1]00             111111 (abcdef) {2,2,3,3}
f = .[?]11
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Lastly we choose alternative $d$ which brings the target to zero:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;a = 0001 = 1
b = 0010 = 2     0 = 0000
c = 0000         0 = 0000
d = 0100 = 4     0 = 0000
e = 0000         0 = 0000
f = 0011 = 3
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we simply read out the number in the &amp;quot;button table&amp;quot; that we built.
The final button counts correspond to our solution $ab^2d^4f^3$.&lt;/p&gt;
&lt;p&gt;To summarize:
For each column in the button press counts we choose one alternative from a set given by the
bit-pattern of the target in the matching column.
The chosen alternative gives us a vector to subtract from the target.
Repeat for each column, and in the end we&apos;ve built the counts for the buttons that make up the target number.
Magical!&lt;/p&gt;
&lt;h2&gt;Analysis&lt;/h2&gt;
&lt;p&gt;I think the table-building point-of-view is nice because it makes it easy to reason about
the complexity of our solution, and see why this works so much better than the brute-force method.
Each recursion level consideres one column of the bit pattern of the target, so the recursion
depth is bounded by the bit-length of the largest number in the target, i.e. its &lt;code&gt;log2&lt;/code&gt; rounded up.
The largest inputs has numbers around 250, so the depth is at most 9.&lt;/p&gt;
&lt;p&gt;How many choices can we expect at each level? This depends on the button set and the target,
since it depends on which pattern we&apos;re looking for as well as how well the buttons cover
the possible patterns.
If $D$ is the dimension of the target (4 in the example) and $B$ is the number of buttons (6 in our example)
we have $2^B=64$ possible button subsets and $2^D=16$ possible bit-patterns that the subsets cover,
so on average a pattern will have $64/16=4$ subsets covering it (this is indeed what we had at each level, but it didn&apos;t have to be that way).
This makes for $4^9=262,144$ leaf nodes in the search tree, assuming no early exits or other pruning.
Still, this is a small number of states to visit.&lt;/p&gt;
&lt;h2&gt;The Full Tree&lt;/h2&gt;
&lt;p&gt;So far we&apos;ve only taken one path through the search tree.
In this figure the nodes are button press counts, and edges are the alternatives we choose.
Edges are labelled with their depth and buttons, and the format &lt;code&gt;2be&lt;/code&gt; means $b^2e^2$.
Rounded boxes are states that hit the target &lt;code&gt;{3,5,4,7}&lt;/code&gt;.
By construction our states here are unique, so this is a proper tree.&lt;/p&gt;
&lt;figure class=&quot;invert&quot;&gt;
  &lt;div&gt;
    &lt;img style=&quot;width: 90%&quot; src=&quot;./3547-presses.svg&quot;&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;The sequence we followed in the example was
$af$, $b^2f^2$, $d^4$, ending at &lt;code&gt;[1 2 0 4 0 3]&lt;/code&gt;, and this
was only one out of 14 possible valid paths.&lt;/p&gt;
&lt;p&gt;We can also show things in a slightly different way by having the &lt;em&gt;targets&lt;/em&gt; make up the nodes.
There are multiple button combinations to reach the same target, so this graph is more tangled up.
It is also smaller, because many nodes have multiple paths to it.
You can imagine using memoization to compute the path from a node to the end once,
and then have the parents of that node reuse the memoized result instead of computing it every time the node is visited in the search.&lt;/p&gt;
&lt;figure class=&quot;invert&quot;&gt;
  &lt;div&gt;
    &lt;img style=&quot;width: 90%&quot; src=&quot;./3547-target.svg&quot;&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;Pretty neat!&lt;/p&gt;
&lt;h2&gt;The Dividing Method&lt;/h2&gt;
&lt;p&gt;This is the method I read &lt;a href=&quot;https://www.reddit.com/r/adventofcode/comments/1pk87hl/comment/ntp4njq/&quot;&gt;on reddit&lt;/a&gt; which explanation kind-of made sense, but also not really.
They have updated the explanation, but I still don&apos;t think it is that convincing.
The method &lt;em&gt;is&lt;/em&gt; exactly the same as mine, but somehow I find it much harder
to understand why it is correct.&lt;/p&gt;
&lt;p&gt;It goes like this:
at every step we find all button subsets that bring the parity of the target to 0.
We try every subset.
Subtract it from the taret to get a target consisting of only even numbers.
Then we divide it in half recurse.&lt;/p&gt;
&lt;p&gt;For our example numbers, this means starting with &lt;code&gt;{3,5,4,7}&lt;/code&gt;, pressing e.g. $af$ to get &lt;code&gt;{2,4,4,6}&lt;/code&gt;, dividing this
to be &lt;code&gt;{1,2,2,3}&lt;/code&gt;, and solve this sub-problem using the same procedure.&lt;/p&gt;
&lt;p&gt;How can we know that the best solution to solve &lt;code&gt;{2,4,4,6}&lt;/code&gt; is to solve &lt;code&gt;{1,2,2,3}&lt;/code&gt; and press those buttons twice?
Could it not happen that the best button counts for &lt;code&gt;{2,4,4,6}&lt;/code&gt; contain some odd number of presses?
We can construct a set of buttons that when pressed an odd number of times will give us a target that is all even,
so it feels like the dividing by two isn&apos;t okay.&lt;/p&gt;
&lt;p&gt;It is, though.&lt;/p&gt;
&lt;p&gt;Instead of showing why it&apos;s correct, we can say that by assuming this structure we aren&apos;t missing out on any solutions.
Let $x$ be a target vector and $B=\{b_i\}_n$ the set of buttons.
The shortest number of presses to get $x$ is some number of each button, so we can write
$$x = b_1^{k_1} b_2^{k_2} \dots b_n^{k_n}$$
to fix some notation.
Now, we still have no idea what the $k_i$ are,
but &lt;em&gt;if&lt;/em&gt; we did, we could move one $b_i$ to the front if $k_i$ is odd, and let the evens stay in place.
That is, we could write it like this:
$$
x = b_2 b_3 b_8
\dots
\left(
b_1^{k_1}
b_2^{k_2-1}
\dots
b_n^{k_n}
\right)
$$
where I&apos;ve said that $k_i$ was odd for $i=2,3,8,\dots$, just to have some concrete numbers to write down.
Now all of the exponents in the parenthesis are even, so we can pull a 2 out,
and all of a sudden we have the decomposition $x=yz$ where
$y$ consists of any button at most once and $z=w^2$ is even.&lt;/p&gt;
&lt;p&gt;The parity of $z$ is zero because all buttons there are pressed an even number of times,
so the parity of $y$ must be the same as $x$, otherwise the parities on the two sides wouldn&apos;t match.
Further, $y\subseteq B$ by construction since we moved at most one of each button to the front.
$y$ may be empty, but this is no concern.
If $z$ is empty it means that $x=y$, so a subset of the buttons yields $x$;
this is our base case!&lt;/p&gt;
&lt;p&gt;Now, we don&apos;t actually know that $k_i$ is odd for $i=2,3,8,\dots$, so we can&apos;t decompose $x$,
but we do know that such a decomposition exists.
So, we can search for it by considering all subsets of the buttons which parity matches $x$.
This gives us a bunch of alternatives for $y$, and we can try them all.
Fixing $y$ lets us compute $z=x\setminus y$,
and if we chose the correct $y$, $z$ will be even and we can divide it&apos;s coefficients by 2 and recurse.&lt;/p&gt;
&lt;p&gt;That&apos;s it!&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;I thought it was very neat how two fairly different trains of thoughts can lead to basically identical algorithms.
I think I prefer the constructive bit-pattern search, especially the alternative framing of
finding columns of bits that build up the final binary numbers that are the button counts,
but I also appreciate the top-down thinking of the dividing method,
where we can argue that a solution exists and that we will find it.&lt;/p&gt;
&lt;p&gt;I was hoping to find a more rigorous proof of correctness than what I&apos;ve written here,
but I think I&apos;ve spent enough time on this problem, and the holidays are almost here.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;hr /&gt;
</content></entry><entry><title>Searching High and fLow</title><id>https://mht.wtf/post/flow/</id><updated>2023-10-19T18:10:56+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/flow/" rel=""/><link href="https://mht.wtf/post/flow/index.html" rel="alternate"/><published>2023-10-19T18:10:56+02:00</published><content type="text/html">&lt;p&gt;Recently&lt;sup&gt;&lt;a href=&quot;#user-content-fn-aha&quot; id=&quot;user-content-fnref-aha&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; at &lt;a href=&quot;https://vind.ai/&quot;&gt;work&lt;/a&gt; I found myself with an interesting problem in need of solving.
The problem was one stage of a larger algorithm, and we wanted to be able to run the whole algorithm in an optimization loop, so it was important that it was fast.&lt;/p&gt;
&lt;p&gt;The problem is this: given a set of &lt;em&gt;points&lt;/em&gt; $P \subseteq\mathbb{R}^2$ and a set of &lt;em&gt;sites&lt;/em&gt; $S\subseteq\mathbb{R}^2$, assign each point to a site so that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The sum of the distances from each assigned point to its site is minimized&lt;/li&gt;
&lt;li&gt;The number of points assigned to a site is below some limit $L$.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Without constraint (2) we can always choose the closest site to each point and be happy. This is also really fast, since we just need to compute the pairwise distances. Before, this is what the system did, but with the introduction of requirement (2), I had to come up with something else. Here is an example of how the optimal solutions look with and without constraint (2), for a sample set of points and sites:&lt;/p&gt;
&lt;figure style=&quot;display: flex; gap: 4rem&quot; class=&quot;invert&quot;&gt;
  &lt;div style=&quot;width: 100%&quot;&gt;
    &lt;img src=&quot;./points-closest.svg&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
    &lt;figcaption&gt;Points grouped to their closest site.&lt;/figcaption&gt;
  &lt;/div&gt;
  &lt;div style=&quot;width: 100%&quot;&gt;
    &lt;img src=&quot;./points-capacitated.svg&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
    &lt;figcaption&gt;Sites have a capacity of 5.&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;Intuitively, it&apos;s clear what&apos;s going on here;
the left site has to &amp;quot;give up&amp;quot; some of its points to the right site, so that while these points were closer to the left site, it frees up the site to also be assigned to the points on the far left.
Computationally, however, it is not so straight forward to see how we can do this.&lt;/p&gt;
&lt;h2&gt;Looking up the problem&lt;/h2&gt;
&lt;p&gt;Or, &lt;em&gt;&amp;quot;How I nearly got tricked into thinking this was NP-Hard&amp;quot;&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;When presented with a problem like this, the first thing I always do is to figure out what the &amp;quot;proper&amp;quot; name of the problem is, because it is very likely to have already been solved. Some quick searching lead me to a couple of problems:&lt;/p&gt;
&lt;h3&gt;Facility location problem&lt;/h3&gt;
&lt;p&gt;In the &lt;a href=&quot;https://en.wikipedia.org/wiki/Facility_location_problem&quot;&gt;facility location problem&lt;/a&gt; you are given a set of potential facility sites $L$ and a set of demand points $D$, and the task is to choose which facilities to open so that the distances from each demand point to an open facility is minimized. This problem is NP-hard.&lt;/p&gt;
&lt;p&gt;It&apos;s very similar to my problem -- we&apos;re given two sets of points and we&apos;re minimizing distances -- but it&apos;s not exactly the same, since my problem asked to optimize the choice of facility per demand point, with the capacity constraint. So I had to keep looking.&lt;/p&gt;
&lt;h3&gt;Vertex k-center problem&lt;/h3&gt;
&lt;p&gt;In the &lt;a href=&quot;https://en.wikipedia.org/wiki/Vertex_k-center_problem&quot;&gt;vertex k-center problem&lt;/a&gt; we&apos;re given a complete undirected graph $G=(V,E)$ and a cost function $c: E\to\mathbb{R}$, and we&apos;re asked to choose vertices $V_0\subseteq V$ to minimize the cost of the vertex farthest away from $V_0$: $$\text{minimize} \max_{v\in V}\min_{ w\in V_0} c(v, w)$$&lt;/p&gt;
&lt;p&gt;The vertex k-center problem is also NP-hard.&lt;/p&gt;
&lt;p&gt;Again, it looks related, but not quite right. We already know which of our points are in which class, so the &amp;quot;combinatorial choosing&amp;quot; aspect of this problem isn&apos;t a part of my problem. The search continued.&lt;/p&gt;
&lt;h3&gt;Assignment problem&lt;/h3&gt;
&lt;p&gt;Eventually, my search lead me to the &lt;a href=&quot;https://en.wikipedia.org/wiki/Assignment_problem&quot;&gt;assignment problem&lt;/a&gt;. Here we want to assign &lt;em&gt;tasks&lt;/em&gt; to &lt;em&gt;workers&lt;/em&gt; where each pair has a cost associated to it, and we seek to minimize the total cost. In graph theory terms (and if the number of tasks and workers is the same), this is finding a minimal &lt;a href=&quot;https://en.wikipedia.org/wiki/Matching_(graph_theory)&quot;&gt;matching&lt;/a&gt; of a certain size, of a weighted bipartite graph.&lt;/p&gt;
&lt;p&gt;This too looks similar to what we want, but again, not quite, due to our constraint. Also, our number of points and sites are not the same, so the matching stuff doesn&apos;t apply. However, this lead me to &lt;a href=&quot;https://ulrich-bauer.org/&quot;&gt;Ulrich Bauer&lt;/a&gt;&apos;s &lt;a href=&quot;https://ulrich-bauer.org/pub/ConstrainedAssignment.pdf&quot;&gt;master thesis&lt;/a&gt;, which table of contents include a section named &amp;quot;Minimum Cost Flow&amp;quot;. Reading the section name was enough.&lt;/p&gt;
&lt;h2&gt;Minimum Cost Flow&lt;/h2&gt;
&lt;p&gt;Let&apos;s start with the more known, related problem: max-flow. In max-flow, you&apos;re given a &lt;a href=&quot;https://en.wikipedia.org/wiki/Flow_network&quot;&gt;flow network&lt;/a&gt;, and the task is to figure out how much &lt;em&gt;flow&lt;/em&gt; you can send through the network. Imagine a network of water pipes through a city with a water basin (a &lt;em&gt;source&lt;/em&gt; node) on one side of the city, and a pipe to the ocean (a &lt;em&gt;sink&lt;/em&gt; node) on the other side. We want to figure out how much water we can send from the basin to the ocean.&lt;/p&gt;
&lt;p&gt;In graph terms, it looks like this: we have a directed graph $G=(V,E)$ and two special nodes: the source $v_s$ and the sink $v_t$.
All edges $e\in E$ have a capacity $e_c$.
Now we assign a &lt;em&gt;flow&lt;/em&gt;, a non-negative number $f(e)\in\mathbb{R}^+$, to the edges.
In the analogy above, the flow correspond to the amount of water flowing through the pipe that is that edge.
We also have some constraints:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The flow in an edge cannot exceed its capacity.&lt;/li&gt;
&lt;li&gt;Nodes have to conserve the flow, so that the flow in the edges going in to the node has to be equal to the flow in the edges going out of that node.&lt;/li&gt;
&lt;li&gt;Only the source $v_s$ is allowed to &lt;em&gt;produce&lt;/em&gt; flow (meaning it can send out more than it got in).&lt;/li&gt;
&lt;li&gt;Only the sink $v_t$ is allowed to &lt;em&gt;consume&lt;/em&gt; flow (meaning it can take in more than it sends out).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We want to maximize the flow that the source node produces, which, due to the conservation of flow (req. (2)), is the same as what the sink consumes, while respecting the constraints.&lt;/p&gt;
&lt;p&gt;That&apos;s max-flow. In minimum cost flow, we also have a &lt;em&gt;cost&lt;/em&gt; $c(e)$ for each edge, per flow. Instead of finding the maximum amount of flow we can send through the network, we want to find the &lt;em&gt;cheapest&lt;/em&gt; way of sending &lt;em&gt;a certain amount&lt;/em&gt; of flow.&lt;/p&gt;
&lt;h3&gt;Going back to our problem&lt;/h3&gt;
&lt;p&gt;Water pipes and capacities can seem like a long way from points and sites in the plane; what&apos;s the connection? If we create a graph $G=(P\cup S, E)$ and imagine the sites $S$ to be on the left side, and points $P$ on the right side, we can draw edges between all pairs of sites and points to get a bipartite graph:&lt;/p&gt;
&lt;figure style=&quot;display: flex; gap: 4rem;&quot; class=&quot;invert&quot;&gt;
  &lt;div style=&quot;width: 100%&quot;&gt;
    &lt;img src=&quot;./bipartite.svg&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
    &lt;figcaption&gt;A bipartite graph with the sites on the left and points on the right.&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;Now we want to say that a flow going through an edge $(s,p)\in E$ means that site $s$ and point $p$ are connected. The cost of this edge should be the distance between the site and point, so that we&apos;ll reduce the total distance of the pairs that we end up assigning.&lt;/p&gt;
&lt;p&gt;There&apos;s a couple more things we need to make sure that a minimum cost flow through our made-up network actually solves our original problem.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The network has to be a flow network&lt;/li&gt;
&lt;li&gt;The sites respect their given limit $L$&lt;/li&gt;
&lt;li&gt;Each point is only connected to one site&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For (1) we can add in a source to the very left and a sink on the very right and connect them to the two groups, like so:&lt;/p&gt;
&lt;figure style=&quot;display: flex; gap: 4rem;&quot; class=&quot;invert&quot;&gt;
  &lt;div style=&quot;width: 100%&quot;&gt;
    &lt;img src=&quot;./flow-graph.svg&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
    &lt;figcaption&gt;The leftmost node is the source node, and the rightmost the sink node.&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;For (2) we can set the capacity on the edge from the source to the site equal to $L$; this way, if we also set the capacity on the edges in the middle to 1, then each edge that is filled with flow will spend one capacity of the source-site edge.
For (3), we can set the capacity of the edges from the points to the sink to be 1.
This way we expect all the flow that comes in along the edge from the site to the point to go along this edge and into the sink.
We still haven&apos;t assigned costs to the edges adjacent to the source and sinks, but we don&apos;t really care about which of these edges are used, so we can set them all to 0. We also know how much flow we want to send through the network, since all the points should be connected to a site, and each of these connections uses 1 flow.&lt;/p&gt;
&lt;p&gt;Here&apos;s the final network, where I have set the site capacity $L$ to be 2. For readability, only one edge in each &amp;quot;layer&amp;quot;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-layer&quot; id=&quot;user-content-fnref-layer&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; is labeled, but the only difference from the other edges are the costs in the middle layer.&lt;/p&gt;
&lt;figure style=&quot;display: flex; gap: 4rem;&quot; class=&quot;invert&quot;&gt;
  &lt;div style=&quot;width: 100%&quot;&gt;
    &lt;img src=&quot;./flow-graph-with-numbers.svg&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
    &lt;figcaption&gt;The middle gray edges will vary in cost. Otherwise, the capacities and costs are constant for each layer.&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;And here&apos;s one possible solution, in which edges with saturated with flow are black, and edges without flow are gray. Counting from the top with 1-indexing, the first site is connected to the first and fourth point, the second site to the third and sixth, and the third site to the second and fifth point.
We see that all sites have maxed their capacity since they have two edges going out to the right, and all points are accounted for, since all of them have an edge to the sink.&lt;/p&gt;
&lt;figure style=&quot;display: flex; gap: 4rem;&quot; class=&quot;invert&quot;&gt;
  &lt;div style=&quot;width: 100%&quot;&gt;
    &lt;img src=&quot;./flow-graph-one-solution.svg&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
    &lt;figcaption&gt;Gray edges are not used, and black edges are saturated with flow.&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;A quick note before I continue, I&apos;ve glossed over one point: what happens if the flows we get from solving the problem aren&apos;t integer? Could we get a bunch of 0.5 flows in the graph? For the approach I went for, the answer is no by construction (as we&apos;ll see). However, when writing this post I tried to prove that any optimal flow with fractional flows could be converted to an integer flow that was at least as cheap, irrespective of how this flow was found, but I couldn&apos;t quite figure out how.
If you do know, &lt;a href=&quot;mailto:~mht/public-inbox@lists.sr.ht&quot;&gt;my public inbox&lt;/a&gt; is open&lt;sup&gt;&lt;a href=&quot;#user-content-fn-int-flow&quot; id=&quot;user-content-fnref-int-flow&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Solving&lt;/h3&gt;
&lt;p&gt;Great, we now have a flow network in which we want to find a min-cost flow.
This was a Rust codebase, so I searched through &lt;a href=&quot;https://crates.io&quot;&gt;crates.io&lt;/a&gt; and found &lt;a href=&quot;https://crates.io/crates/mcmf&quot;&gt;mcmf&lt;/a&gt;, a crate that wraps the &lt;a href=&quot;https://lemon.cs.elte.hu/trac/lemon&quot;&gt;LEMON&lt;/a&gt; library.
LEMON was referenced in the Wikipedia page for minimum cost flow, so I figured it was a safe bet.
I added it to my project, set up my graph, ran &lt;code&gt;.mcmf()&lt;/code&gt; which ran in a fraction of a second, read out the results I wanted, and it all Just Work™ed.&lt;/p&gt;
&lt;p&gt;However ...&lt;/p&gt;
&lt;p&gt;LEMON is a C++ library, and I am compiling my Rust crate to &lt;code&gt;wasm&lt;/code&gt; using &lt;a href=&quot;https://github.com/rustwasm/wasm-pack&quot;&gt;wasm-pack&lt;/a&gt;.
This works really well for Rust code, but not for a joint C++ code base; it seems the issue is with the &lt;em&gt;compilation target&lt;/em&gt;.
&lt;code&gt;rustc&lt;/code&gt; has both &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt; and &lt;code&gt;wasm32-unknown-emscripten&lt;/code&gt; listed as &lt;a href=&quot;https://doc.rust-lang.org/nightly/rustc/platform-support.html#tier-2-without-host-tools&quot;&gt;Tier 2 supported platforms&lt;/a&gt;, but &lt;code&gt;rustc&lt;/code&gt; cannot, of course, compile C++ code.
So we need a separate toolchain for C++.
&lt;a href=&quot;https://emscripten.org/&quot;&gt;emscripten&lt;/a&gt; is a complete toolchain for compiling C++ to &lt;code&gt;wasm32-unknown-emscripten&lt;/code&gt;, but &lt;code&gt;wasm-pack&lt;/code&gt; compiles to &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt;.
Doesn&apos;t sound like a big difference, right?&lt;/p&gt;
&lt;p&gt;Wrong.&lt;/p&gt;
&lt;p&gt;I&apos;m still a little hazy about the details here, but it seems that the two targets are fundamentally different, and that it is not possible to compile for the two targets and somehow join them.
Furthermore, it also seems that adding &lt;code&gt;-emscripten&lt;/code&gt; as a target to &lt;code&gt;wasm-pack&lt;/code&gt; is also a no-no.
If there is a solution to this problem, I&apos;d love to hear it!
Please send a mail to &lt;a href=&quot;mailto:~mht/public-inbox@lists.sr.ht&quot;&gt;my public inbox&lt;/a&gt; if you know.&lt;/p&gt;
&lt;p&gt;I gave up on this path, and went down the &lt;em&gt;other&lt;/em&gt; path: implementing it myself.&lt;/p&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;Truth be told, I have never been very comfortable with implementing max-flow.
It&apos;s not something I do very often, and there are a lot choices one has to make.
Choice of algorithm: Standard &lt;a href=&quot;https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm&quot;&gt;Ford–Fulkerson&lt;/a&gt; with &lt;a href=&quot;https://en.wikipedia.org/wiki/Breadth-first_search&quot;&gt;BFS&lt;/a&gt; (aka. &lt;a href=&quot;https://en.wikipedia.org/wiki/Edmonds%E2%80%93Karp_algorithm&quot;&gt;Edmonds-Karp&lt;/a&gt;), try &lt;a href=&quot;https://en.wikipedia.org/wiki/Dinic%27s_algorithm&quot;&gt;Dinic&apos;s&lt;/a&gt;, or finally try to understand &lt;a href=&quot;https://en.wikipedia.org/wiki/Push%E2%80%93relabel_maximum_flow_algorithm&quot;&gt;Push-relabel&lt;/a&gt;?
Ford-Fulkerson feels like a safe choice for the first version.
How do you represent the graph?
Everything on the heap?
&lt;code&gt;Vec&amp;lt;Node&amp;gt;&lt;/code&gt; for the nodes and have &lt;code&gt;Node&lt;/code&gt; contain an adjacency list of indices for the edges?
Where does the flow and capacities go?
BFS through the graph sounds okay, but how do you represent a path?
Won&apos;t there be a lot of them?
&lt;code&gt;Vec&amp;lt;usize&amp;gt;&lt;/code&gt; again for each path sounds like a lot of allocations, but maybe it&apos;s okay.
Oh and by the way, this is all just for max-flow.
How do we even solve min-cost-max-flow?&lt;/p&gt;
&lt;p&gt;When bogged down with these uncertainties, the best way forward is to just do &lt;em&gt;something&lt;/em&gt;, with the expectation that you&apos;re only trying something out.
At this stage, the only important thing is &lt;a href=&quot;https://pages.cs.wisc.edu/~remzi/Naur.pdf&quot;&gt;building a theory&lt;/a&gt; of the problem.&lt;/p&gt;
&lt;h3&gt;First version&lt;/h3&gt;
&lt;p&gt;I decided that I didn&apos;t want to represent the graph completely as-is; the source and sink nodes could be implicit in the code&lt;sup&gt;&lt;a href=&quot;#user-content-fn-imp&quot; id=&quot;user-content-fnref-imp&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.
The flow and capacities for the edges adjacent to the source and sink could also be handled separately, since we know what the graph looks like around these two vertices.&lt;/p&gt;
&lt;p&gt;Further, I wasn&apos;t sure if the min-cost aspect of the problem was difficult, and decided on a greedy approach without making sure that the solutions produced were optimal. I wanted to write a loop in which we find the cheapest way of increasing the flow by 1, and do that $|P|$ times.
This is just Ford-Fulkerson where you find the min-cost path, and I figured it was probably right, but I didn&apos;t sit down and prove it.
By checking against &lt;code&gt;mcmf&lt;/code&gt; later I would get a hunch for whether this really is optimal or not, but I didn&apos;t want to spend time figuring this out before seeing if the implementation was feasible&lt;sup&gt;&lt;a href=&quot;#user-content-fn-ce&quot; id=&quot;user-content-fnref-ce&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; .&lt;/p&gt;
&lt;p&gt;For the vertices I made an &lt;code&gt;enum&lt;/code&gt; with an index to identify the two different types&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;pub enum Node {
    Site(usize),
    Point(usize),
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and for the edges I made a &lt;code&gt;struct&lt;/code&gt; containing these indices, as well as the edge cost.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;pub struct Edge {
    pub site: usize,
    pub point: usize,
    pub cost: F64,
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We don&apos;t actually have to represent the edges with an adjacency list or anything like that, because we have a complete bipartite graph, so we already know what all the edges are.&lt;/p&gt;
&lt;p&gt;Edge costs (I duplicated these for some reason) were precomputed and stored in a $|P|\times |S|$ matrix;
Since all edges are there, the matrix is completely full.
Capacities were handled in a slightly funny way; I made a &lt;code&gt;Vec&amp;lt;usize&amp;gt;&lt;/code&gt; of length $|S|$ ($|P|$), where each entry corresponded to the capacity of the site (point) with the same index, respectively.
For the cross-edges I made a &lt;code&gt;Matrix&amp;lt;bool&amp;gt;&lt;/code&gt; of size $|S|\times |P|$ called &lt;code&gt;edge_used&lt;/code&gt; where each entry corresponded to whether the edge was used or not, since these edges had unit capacity.&lt;/p&gt;
&lt;p&gt;The funny looking &lt;code&gt;F64&lt;/code&gt; is a type alias&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;type F64 = float_ord::FloatOrd&amp;lt;f64&amp;gt;;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;using &lt;a href=&quot;https://crates.io/crates/float-ord&quot;&gt;&lt;code&gt;float_ord&lt;/code&gt;&lt;/a&gt; so that we can order floats&lt;sup&gt;&lt;a href=&quot;#user-content-fn-float-ord&quot; id=&quot;user-content-fnref-float-ord&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;A path through the network was also a &lt;code&gt;struct&lt;/code&gt; listing the &lt;code&gt;Node&lt;/code&gt;s in the path, as well as the &lt;em&gt;negative&lt;/em&gt; cost of the path, because the only priority queue in Rust&apos;s standard library is &lt;code&gt;std::collections::BinaryHeap&lt;/code&gt;, which is a max-heap&lt;sup&gt;&lt;a href=&quot;#user-content-fn-ord-reverse&quot; id=&quot;user-content-fnref-ord-reverse&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;pub struct Path {
    pub neg_cost: F64,
    pub edges: Vec&amp;lt;Node&amp;gt;,
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can write the function &lt;code&gt;step&lt;/code&gt; on &lt;code&gt;Path&lt;/code&gt; which extends the &lt;code&gt;path&lt;/code&gt; by one move, to all possible new paths:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;pub fn step(&amp;amp;self, cost: &amp;amp;Matrix&amp;lt;F64&amp;gt;, edge_used: &amp;amp;Matrix&amp;lt;bool&amp;gt;) -&amp;gt; Vec&amp;lt;Path&amp;gt; {
    let last = self.edges.last().unwrap();
    match last {
        Node::Site(si) =&amp;gt; (0..cost.cols)
            .filter(|pi| !edge_used.get(*si, *pi))
            .flat_map(|pi| {
                if self.has_edge(*si, pi) {
                    return None;
                }
                let cost = cost.get(*si, pi);
                let mut edges = self.edges.clone();
                edges.push(Node::Point(pi));
                Some(Path {
                    neg_cost: FloatOrd(self.neg_cost.0 - cost.0),
                    edges,
                })
            })
            .collect(),
        Node::Point(pi) =&amp;gt; (0..cost.rows)
            .filter(|si| *edge_used.get(*si, *pi))
            .flat_map(|si| {
                if self.has_edge(si, *pi) {
                    return None;
                }
                let cost = cost.get(si, *pi);
                let mut edges = self.edges.clone();
                edges.push(Node::Site(si));
                Some(Path {
                    neg_cost: FloatOrd(self.neg_cost.0 + cost.0),
                    edges,
                })
            })
            .collect(),
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Some notes on what&apos;s going on here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When going from sites to points we only want to try edges that haven&apos;t already been used, since these would have no capacity left, so these are &lt;code&gt;.filter&lt;/code&gt;ed out.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;has_edge&lt;/code&gt; checks that the edge is not already included in the path, in order to avoid looping.&lt;/li&gt;
&lt;li&gt;Not sure why I used &lt;code&gt;flat_map&lt;/code&gt; and returned &lt;code&gt;Option&lt;/code&gt; instead of just another &lt;code&gt;.filter&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;When going from points &lt;em&gt;back&lt;/em&gt; to sites, we can only go along edges that &lt;em&gt;have&lt;/em&gt; been used, since we are effectively &lt;em&gt;undoing&lt;/em&gt; the flow that goes along the edge. For this reason, we&apos;re &lt;em&gt;adding&lt;/em&gt; to the &lt;em&gt;negative&lt;/em&gt; cost.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are the main mechanisms for computing the min-cost flow, apart from the actual flow algorithm itself. This was pretty simple now, but the implementation was somewhat noisy, so here&apos;s the pseudo code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;loop {
    create initial paths from sites that has capacity to all points
    put the paths in a max-heap

    while max-heap has elements {
        path = pop(max-heap)
        if path leads to a point that&apos;s not assigned yet {
            reduce site capacity by one
            set point capacity to zero (reduce by 1, but we know it is 1)
            mark edges as used
            restart main loop
        }
        children = expand the path
        add children to the heap
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This algorithm produced exactly the same pairings as &lt;code&gt;mcmf&lt;/code&gt; did, but it was &lt;em&gt;a lot&lt;/em&gt; slower. Not 2x or 5x, more like 1000x.&lt;/p&gt;
&lt;p&gt;The program took over 20 seconds.&lt;/p&gt;
&lt;h3&gt;Second version&lt;/h3&gt;
&lt;p&gt;Why was the first attempt so slow? Here I had a few hypotheses straight off the bat, with some ideas for solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Inefficient representations of the graph and paths. Many allocations.
&lt;ul&gt;
&lt;li&gt;Try to add back links instead of storing whole paths?&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Poor search through the graph; too many paths are expanded
&lt;ul&gt;
&lt;li&gt;Maybe prune based on cost? Can we bound cost?&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;std::collections::BinaryHeap&lt;/code&gt; is slow; should try something else
&lt;ul&gt;
&lt;li&gt;Probably something on crates.io?&lt;/li&gt;
&lt;li&gt;Write my own?&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Only a single flow is added in each iteration; should figure out how to augment many paths at the same time.
&lt;ul&gt;
&lt;li&gt;Doesn&apos;t Dinic&apos;s do this? Or was that Push-relabel?&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I tried cost-pruning; it helped, but not by much. I tried to change &lt;code&gt;Path::has_edge&lt;/code&gt; to just check for a site, since I was pretty sure I didn&apos;t have cycles of negative cost (if you have such a cycle, it will pay off to walk it, which means you&apos;ll visit a node twice, so the two checks aren&apos;t the same); it helped but not by much. I tried the &lt;a href=&quot;https://crates.io/crates/priority-queue&quot;&gt;&lt;code&gt;priority-queue&lt;/code&gt;&lt;/a&gt; crate (which also easily supported making a min-heap), but that was even slower.&lt;/p&gt;
&lt;p&gt;Eventually, I decided to specialize the search to my instance of the problem&lt;sup&gt;&lt;a href=&quot;#user-content-fn-inst&quot; id=&quot;user-content-fnref-inst&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;.
I didn&apos;t need to solve MCMF for any general graph, since I had very specific knowledge about the types of graph I would solve it on.
When searching for a path from a site to the sink (which, again, we didn&apos;t include in the graph explicitly), we can do two operations:
(1) go to an unassigned point and finish there, let&apos;s call this operation &lt;code&gt;Connect&lt;/code&gt;;
or (2) go to a point and follow the back-edge to a site, let&apos;s call this operation &lt;code&gt;Route&lt;/code&gt;.
Note here that &lt;code&gt;Route&lt;/code&gt; is unique for a point, since there is at most one site connected to it.&lt;/p&gt;
&lt;p&gt;Here&apos;s &lt;code&gt;Connect&lt;/code&gt;, when standing at the filled-in site, and routing along the edges with arrows:&lt;/p&gt;
&lt;figure style=&quot;display: flex; gap: 4rem;&quot; class=&quot;invert&quot;&gt;
  &lt;div style=&quot;width: 100%&quot;&gt;
    &lt;img src=&quot;./route-connect.svg&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
    &lt;figcaption&gt;&lt;code&gt;Connect&lt;/code&gt; corresponds to finding a path straight to the sink.&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;And here is &lt;code&gt;Route&lt;/code&gt;:&lt;/p&gt;
&lt;figure style=&quot;display: flex; gap: 4rem;&quot; class=&quot;invert&quot;&gt;
  &lt;div style=&quot;width: 100%&quot;&gt;
    &lt;img src=&quot;./route.svg&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
    &lt;figcaption&gt;Left: We find a shortest-path following a back-edge. Right: The resulting flow.&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;Instead of having long paths, each listing the vertices in the path, maybe it would help to have a path be a &lt;code&gt;Vec&amp;lt;Move&amp;gt;&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;type SiteId = usize; // I also aliased these, for later on
type PointId = usize;

pub enum Move {
    Connect(SiteId, PointId),
    Route(SiteId, PointId, SiteId),
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Why would this help? Consider the path in the &lt;code&gt;Route&lt;/code&gt; operation in the figure above. Had we done this in a full graph representation, we would have had six vertices in the path: the source, the site we&apos;re currently at, the point we&apos;re routing around, the site currently connected to that point, the cheapest point for &lt;em&gt;that&lt;/em&gt; site to get to the sink, and the sink.
For the &lt;code&gt;Move&lt;/code&gt; representation however, we only need three IDs, namely two for which site we&apos;re talking about, and one for which point we&apos;re rerouting.&lt;/p&gt;
&lt;p&gt;Further, we can imagine splitting up the list of points into two: the points that are already assigned to a site, and the ones that aren&apos;t: &lt;code&gt;Connect&lt;/code&gt; only works for unassigned points, and &lt;code&gt;Route&lt;/code&gt; only works for assigned points. Thus, when standing at a site, we have $|P|$ choices to make, since each point correspond to one &lt;code&gt;Move&lt;/code&gt;. Before, we had $|P|$ choices to make (which point to visit next), and then for each choice we had $|S|+1$ choices to make (go back to any of the sites, or go to the sink). Most of these were quickly pruned, but I figured they might have added a lot of work for the search.&lt;/p&gt;
&lt;p&gt;There was one more important insight to make: When we perform the move &lt;code&gt;Route(a, p, b)&lt;/code&gt; , we disconnect the point &lt;code&gt;p&lt;/code&gt; from site &lt;code&gt;b&lt;/code&gt; and connect it to &lt;code&gt;a&lt;/code&gt;, to free up capacity at &lt;code&gt;b&lt;/code&gt;, by paying the cost difference of the new edge &lt;code&gt;ap&lt;/code&gt; from the old edge &lt;code&gt;pb&lt;/code&gt;, as well as spending &lt;code&gt;a&lt;/code&gt;s capacity. This is the &lt;em&gt;only&lt;/em&gt; thing going on: we move one capacity from &lt;code&gt;b&lt;/code&gt; to &lt;code&gt;a&lt;/code&gt; by paying the difference in edge cost. Thus, for the rest of the search, it doesn&apos;t matter which &lt;code&gt;t&lt;/code&gt; we choose. The only thing that matters is the edge cost difference.&lt;/p&gt;
&lt;p&gt;This means that when considering different &lt;code&gt;Route(a, t, b)&lt;/code&gt;s for different choices of &lt;code&gt;t&lt;/code&gt;, we only need to look at the cheapest, because the net-result of performing the &lt;code&gt;Route&lt;/code&gt; is the same for all &lt;code&gt;Route&lt;/code&gt;s around these two sites.
We can look at all possible &lt;code&gt;t&lt;/code&gt;s, choose the cheapest, and continue the search with only that &lt;code&gt;Route&lt;/code&gt;. This is a huge help, because for each pair of sites we don&apos;t have a combinatorial explosion of different &lt;code&gt;Route&lt;/code&gt; operations. We only have &lt;em&gt;one&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;A few other small optimizations (another stab at a &lt;code&gt;cost_limit&lt;/code&gt; to prune old paths,
sorting the points for each site so that it should be easier to find cheap moves,
move a &lt;code&gt;Vec::clone&lt;/code&gt; down below an &lt;code&gt;if .. { ... continue; }&lt;/code&gt;,
and other small improvements) were small done after this.
I got big speedup, but I was still not where I would have to be.&lt;/p&gt;
&lt;p&gt;The program took around 2 seconds.&lt;/p&gt;
&lt;h3&gt;Third version&lt;/h3&gt;
&lt;p&gt;I felt like I was really getting somewhere with &lt;code&gt;Move&lt;/code&gt;, but at the same time, the second version still felt too general.
The insight about &amp;quot;only the best route matters&amp;quot; helped me find a new framing of the problem: when routing from a site, there is really only $(|S|-1)+1$ moves we can do:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Connect to the closest unpaired point and be done with the search (1 move).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Route&lt;/code&gt; the best route around any of the other sites and continue the search ($|S|-1$ moves).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I also knew that for my application, $|S|$ would always be less than 10, and 10 is a really small number. What does this buy us?&lt;/p&gt;
&lt;p&gt;Forget the graph, and don&apos;t think about nodes and edges. We only need to find the cheapest sequence of these &lt;code&gt;Move&lt;/code&gt;s. And for this, we need their cost.&lt;/p&gt;
&lt;p&gt;I made a &lt;code&gt;routing_table&lt;/code&gt; that was a $|S|\times|S|$ matrix. Entry $S_{ij}$ contained the cost of the best &lt;code&gt;Route(i, p, j)&lt;/code&gt; over all points $p$, and the diagonal entries $S_{ii}$ contained the cost of the best &lt;code&gt;Connect(i, p)&lt;/code&gt;. In addition, the table contained the index of the point &lt;code&gt;p&lt;/code&gt;, which works out nicely in both the &lt;code&gt;Route&lt;/code&gt; and &lt;code&gt;Connect&lt;/code&gt; case, since they both have one point. Here&apos;s the code to initialize the table:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;fn initialize_routing_table(&amp;amp;mut self) {
    let num_sites = self.cost.rows;
    let num_pts = self.cost.cols;
    self.routing_table = Matrix::new(
        num_sites,
        num_sites,
        (FloatOrd(f64::INFINITY), PointId::MAX)
    );
    for a in 0..num_sites {
        for b in 0..num_sites {
            if a == b {
                continue;
            }

            let mut cand_cost = f64::INFINITY;
            let mut cand_ind = SiteId::MAX;

            for t in 0..num_pts {
                if !*self.edge_used.get(b, t) {
                    continue;
                }
                let route = Move::Route(a, t, b);
                let cost = self.move_cost(&amp;amp;route).0;
                if cost &amp;lt; cand_cost {
                    cand_cost = cost;
                    cand_ind = t;
                }
            }
            *self.routing_table.get_mut(a, b) = (FloatOrd(cand_cost), cand_ind);
        }
    }

    for s in 0..num_sites {
        if let Some((min_cost, t)) = (0..num_pts)
            .filter(|pi| self.tur_cap[*pi] == 1)
            .map(|pi| (*self.cost.get(s, pi), pi))
            .min()
        {
            *self.routing_table.get_mut(s, s) = (min_cost, t);
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To make up some numbers, here&apos;s what the &lt;code&gt;routing_table&lt;/code&gt; could look like, for a graph with 3 sites&lt;sup&gt;&lt;a href=&quot;#user-content-fn-madeup&quot; id=&quot;user-content-fnref-madeup&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;$$
\begin{bmatrix}
(32.08, t_3) &amp;amp; (32.79, t_1) &amp;amp; (12.01, t_7)\\
(28.62, t_4) &amp;amp; (41.41, t_2) &amp;amp; (24.88, t_4)\\
(14.19, t_5) &amp;amp; (21.31, t_9) &amp;amp; (15.89, t_6)
\end{bmatrix}
$$&lt;/p&gt;
&lt;p&gt;Here we&apos;re saying that if we wanted to connect the first site to the nearest unpaired point ($t_3$), it would cost $32.08$ ($S_{1,1}$).
However, re-routing a point from site 3 to site 1 around $t_5$ costs $14.19$ ($S_{3,1}$), and connecting site 3 to its nearest unpaired turbine ($t_6$) costs $15.89$ ($S_{3,3}$) for a total of $30.08$.
If site 3 is already full, this is the shortest path.&lt;/p&gt;
&lt;p&gt;I want to highlight that this table is the &lt;em&gt;only&lt;/em&gt; information we need to perform the search.
We don&apos;t need to know anything about the graph, or the points, or the sites.
We don&apos;t even need to look at capacities, because the information they&apos;re giving us is already encoded in the table.
Once we have this table, these 18 numbers is all that&apos;s required to find the cheapest way of increasing the network flow by 1.&lt;/p&gt;
&lt;p&gt;This was already a pretty large leap from the last version, so I decided to be blunt in the next step, and pre-compute &lt;em&gt;all possible paths&lt;/em&gt;.
After all, we don&apos;t have that many of them.
A list of &lt;code&gt;SiteInd&lt;/code&gt;s can be used as a path, where the first site is the start site, the intermediate sites are routed around using the precomputed points, and the last site goes to its closest unmatched point.
Since the sites in this list have to be unique, we have at most $|S|!$ of them, which sounds bad, but since $|S|$ is at most 10, this is at most 3&apos;628&apos;800.
If $|S|$ is a more reasonable&lt;sup&gt;&lt;a href=&quot;#user-content-fn-reasonable&quot; id=&quot;user-content-fnref-reasonable&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; 5, this is merely 120.
Compute all paths, and choose the cheapest.&lt;/p&gt;
&lt;p&gt;Each iteration of this main loop invalidates the &lt;code&gt;routing_table&lt;/code&gt;, since the site-point assignment has changed.
Since I had just implemented the table approach, I didn&apos;t want to also incrementally update only the parts of it that did change, so instead I recomputed the whole table before every iteration.&lt;/p&gt;
&lt;p&gt;This caveman solution took 200ms.&lt;/p&gt;
&lt;h3&gt;Version 3.5&lt;/h3&gt;
&lt;p&gt;200ms is better, but this was still a very decent chunk of the total time of my program. Recall that this whole computation is just the first step of a bigger system. But with a few very naïve choices in the last solution I was confident that I could speed it up some more.&lt;/p&gt;
&lt;p&gt;I stored a &lt;code&gt;Search&lt;/code&gt; (my new name for &lt;code&gt;Path&lt;/code&gt;) in a &lt;code&gt; struct&lt;/code&gt; with a &lt;a href=&quot;https://crates.io/crates/small_vec&quot;&gt; &lt;code&gt;SmallVec&lt;/code&gt;&lt;/a&gt; listing the indices in the matrix corresponding to the move that made up the path:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;struct Search {
    moves: SmallVec&amp;lt;[(SiteInd, SiteInd); 10]&amp;gt;,
    cost: F64,
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There&apos;s a lot of duplicate data here, since &lt;code&gt;moves&lt;/code&gt; is of the form &lt;code&gt;[(a, b), (b, c), (c, d)]&lt;/code&gt;, but maybe this was easier to use? I am not sure why I did it this way.
Now it&apos;s just a matter of finding the shortest path through the &lt;code&gt;routing_table&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If the entries in the &lt;code&gt;routing_table&lt;/code&gt; are all non-negative, life is good, because we can use Dijkstra&apos;s algorithm to find the shortest path to a diagonal entry (which, recall, represents ending the flow path).
We&apos;ve gone full circle, and are back again at &lt;code&gt;std::collections::BinaryHeap&lt;/code&gt; and searching through a graph (this time, $K_{|S|}$: the complete graph on $|S|$ vertices).&lt;/p&gt;
&lt;p&gt;I initialized the queue with the legal subset&lt;sup&gt;&lt;a href=&quot;#user-content-fn-legal&quot; id=&quot;user-content-fnref-legal&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;11&lt;/a&gt;&lt;/sup&gt; of the total $|S|^2$ initial moves, and started to &lt;code&gt;pop&lt;/code&gt;.
If I got a diagonal entry back, that&apos;s the path.
If not, I expanded the path from the end of the path (&lt;code&gt;search.moves.last().unwrap().1&lt;/code&gt;), and considered all other possible sites to extend to.
Sites that were already in the path were filtered out.
Here&apos;s the code:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;while let Some(search) = searches.pop() {
    let mov = search.moves.last().unwrap();
    // Diagonal entry is the best; take it if possible.
    if mov.0 == mov.1 {
        if search.cost &amp;lt; candidate.cost {
            candidate = search;
            break;
        }
        continue;
    }

    // Non-diagonal entry; expand to the possible next moves in the sequence.
    let a = mov.1;
    for b in (0..num_sites).filter(|&amp;amp;b| !search.contains_site(b)) {
        let cost =
            FloatOrd(search.cost.0 + self.routing_table.get(a, b).0 .0);

        self.cost.partial_cmp(&amp;amp;other.cost).map(|o| o.reverse())
        let mut moves = search.moves.clone();
        moves.push((a, b));
        let s = Search { moves, cost };
        searches.push(s);
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I hinted at this above, but note that we don&apos;t have to check for site capacities in this loop.
Direct connections are only in the table if they&apos;re valid (otherwise they&apos;re $\infty$), and &lt;code&gt;Route&lt;/code&gt; operations don&apos;t need the target site to have capacity, since we&apos;re freeing up capacity at the source site.
Since the initial paths are also only to sites with capacity, we don&apos;t have to check for capacities at all.&lt;/p&gt;
&lt;p&gt;The case with negative entries in &lt;code&gt;routing_table&lt;/code&gt; calls for a different strategy, since now you can suddenly produce cheaper paths by continuing to route around other sites.
Instead of implementing a &amp;quot;real&amp;quot; search algorithm, I bounded the cost savings possible, and used this to prune paths that were so expensive that there would be no way for the negative entries to make up for it.&lt;/p&gt;
&lt;p&gt;I did this in a very loose way: if $\text{m}$ is the smallest entry in &lt;code&gt;routing_table&lt;/code&gt;, then that&apos;s the most savings we can get from extending a path of length $l$ to $l+1$.
Since we also know the max length of a path, $|S|$, the best possible cost decrease of a started path of length $l$ is $m(|S|-l)$.
This is not at all tight&lt;sup&gt;&lt;a href=&quot;#user-content-fn-tight&quot; id=&quot;user-content-fnref-tight&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;12&lt;/a&gt;&lt;/sup&gt;, but it&apos;s really easy. Here&apos;s what it looked like:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rust&quot;&gt;while let Some(search) = searches.pop() {
    let mov = search.moves.last().unwrap();
    // Diagonal entry is the best; take it if possible.
    if mov.0 == mov.1 {
        if search.cost &amp;lt; candidate.cost {
            candidate = search;
        }
        continue;
    }

    // Non-diagonal entry; expand to the possible next moves in the sequence.
    let a = mov.1;
    for b in (0..num_sites).filter(|&amp;amp;b| !search.contains_site(b)) {
        let cost =
            FloatOrd(search.cost.0 + self.routing_table.get(a, b).0 .0);
        let best_future_cost = cost.0
            + (num_subs - (search.moves.len() as SiteInd + 1)) as f64 * min_table_entry;

        if candidate.cost.0 &amp;lt; best_future_cost {
            continue;
        }

        let mut moves = search.moves.clone();
        moves.push((a, b));
        let s = Search { moves, cost };
        searches.push(s);
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Along the way I also pulled the trigger and changed &lt;code&gt;SiteInd&lt;/code&gt; and &lt;code&gt;PointInd&lt;/code&gt; to be &lt;code&gt;u8&lt;/code&gt; and &lt;code&gt;u16&lt;/code&gt; respectively, which, surprisingly, sped up the code by 30% (!).
I continued to recompute the &lt;code&gt;routing_table&lt;/code&gt; from scratch in between every single call to &lt;code&gt;route&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Now the program took 4ms, and I declared it Good Enough.&lt;/p&gt;
&lt;h2&gt;The &amp;quot;Right&amp;quot; Solution&lt;/h2&gt;
&lt;p&gt;I had a lot of fun with this problem.
It&apos;s both fun and rewarding to iterate on a problem and seeing the time required to solve it going from &amp;quot;get a coffee&amp;quot;, to &amp;quot;impatiently wait&amp;quot; to &amp;quot;wait&amp;quot; to &amp;quot;quick, if you run it once&amp;quot; to &amp;quot;fast&amp;quot; to &amp;quot;can be called in a loop by another program&amp;quot;.
It&apos;s also fun when this isn&apos;t just an exercise in how good it can get, but actually a part of what you&apos;re really trying to do&lt;sup&gt;&lt;a href=&quot;#user-content-fn-business&quot; id=&quot;user-content-fnref-business&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;But mostly, it was fun because the solution &lt;em&gt;feels right&lt;/em&gt;.
I have a theory that &lt;em&gt;most&lt;/em&gt; problems&lt;sup&gt;&lt;a href=&quot;#user-content-fn-problem&quot; id=&quot;user-content-fnref-problem&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;14&lt;/a&gt;&lt;/sup&gt; we programmers are dealing with are pretty simple, when viewed from the right angle.
This angle is often hard to find, but once you have found it, things just seem to &amp;quot;work out&amp;quot;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-workout&quot; id=&quot;user-content-fnref-workout&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;15&lt;/a&gt;&lt;/sup&gt;, in terms of complexity, number of bugs, maintainability, debugability, all of these axes.&lt;/p&gt;
&lt;p&gt;My final version is around 10&apos;000 times faster than my initial version.
When presented with such a huge difference without having any context, it is very easy to jump to conclusions.
For instance, one might attribute this to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Language; it was written in slowlang first, and then ported to fastlang.&lt;/li&gt;
&lt;li&gt;Lack of optimizations; ran without &lt;code&gt;--release&lt;/code&gt;, &lt;code&gt;-O2&lt;/code&gt;, or similar.&lt;/li&gt;
&lt;li&gt;Algorithmic improvements; change a naïve algorithm to a high performing one.&lt;/li&gt;
&lt;li&gt;A full team of experts worked on it for months, creating an engineering jewel that normal programmers simply can&apos;t match.&lt;/li&gt;
&lt;li&gt;Hyper-optimized code; inline assembly, &lt;code&gt;unsafe&lt;/code&gt; everywhere, &lt;a href=&quot;https://en.wikipedia.org/wiki/Profile-guided_optimization&quot;&gt;PGO&lt;/a&gt;, lot&apos;s of impossible to read code, impossible to debug, probably requires a blood sacrifice.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this case though, none of the above is true&lt;sup&gt;&lt;a href=&quot;#user-content-fn-speedup&quot; id=&quot;user-content-fnref-speedup&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;16&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Language was the same.&lt;/li&gt;
&lt;li&gt;Optimization levels were the same.&lt;/li&gt;
&lt;li&gt;I would argue that the algorithm is still the same --- we&apos;re still solving min-cost-max-flow with successive shortest paths --- but since we are making assumptions about the input I can see a claim for the algorithm being different.&lt;/li&gt;
&lt;li&gt;Full team for months is also off the mark; I&apos;m no expert, and certainly not a whole team of them.
Further, this whole process, from initial test with &lt;code&gt;mcmf&lt;/code&gt; to final code written, took slightly longer than &lt;strong&gt;two working days&lt;/strong&gt;.
The first commit was around 15:00 on Wednesday&lt;sup&gt;&lt;a href=&quot;#user-content-fn-wed&quot; id=&quot;user-content-fnref-wed&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;17&lt;/a&gt;&lt;/sup&gt;, and the last commit was 16:15 on Friday (plus a small bugfix on Monday morning).&lt;/li&gt;
&lt;li&gt;Most importantly though, the last one is not true.
There&apos;s no inline assembly, no &lt;code&gt;unsafe&lt;/code&gt;, no special tooling, no architecture specific code, and no &amp;quot;every-trick-in-the-book-pulled&amp;quot; code.
Quite the opposite: there&apos;s very &lt;em&gt;little&lt;/em&gt; code.
The whole module (excluding the &lt;code&gt;Matrix&lt;/code&gt; struct) is &lt;strong&gt;212 lines of code&lt;/strong&gt;, as reported by &lt;a href=&quot;https://github.com/XAMPPRocky/tokei&quot;&gt;&lt;code&gt;tokei&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So how come we got a 10&apos;000x speedup?
I think it&apos;s the all due to the updated &lt;em&gt;framing&lt;/em&gt; of what we&apos;re really trying to do.
&amp;quot;The problem&amp;quot; was never about flow through a graph.
This is a made up mental framework for us to work in, so that we can use general techniques to specific problems.&lt;/p&gt;
&lt;h2&gt;Shoutout to LEMON&lt;/h2&gt;
&lt;p&gt;I did compare my solution to LEMON when I was getting below 100ms.
I had given up trying to integrate it, but it was still a very useful benchmark.
For the &amp;quot;largest reasonable&amp;quot; input I was testing with, LEMON was still faster than my final version.
LEMON, of course, solves the problem in its general form, and as such, is a way better implementation than mine.
However, LEMON &lt;em&gt;do&lt;/em&gt; &amp;quot;pull-many-tricks-in-the-book&amp;quot;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-anytricks&quot; id=&quot;user-content-fnref-anytricks&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;18&lt;/a&gt;&lt;/sup&gt;, and &lt;em&gt;are&lt;/em&gt; written by people who have extensive experience with max-flow-min-cost, so I didn&apos;t feel so bad.&lt;/p&gt;
&lt;p&gt;I have stepped through their &lt;a href=&quot;https://lemon.cs.elte.hu/trac/lemon/wiki/Downloads&quot;&gt;source code&lt;/a&gt;, and started reading some of &lt;a href=&quot;http://lemon.cs.elte.hu/pub/doc/1.3.1/a00639.html&quot;&gt;their references&lt;/a&gt;
to better understand how they achieve what they have;
most notably &lt;a href=&quot;https://arxiv.org/abs/1207.6381&quot;&gt;this&lt;/a&gt; experimental study.
The &lt;em&gt;&amp;quot;Network Simplex Method&amp;quot;&lt;/em&gt; seems to be a key term, but I haven&apos;t understood this yet.&lt;/p&gt;
&lt;p&gt;There&apos;s still a hope, of course, that if I only partially invalidate my &lt;code&gt;routing_table&lt;/code&gt; instead of recomputing it at every iteration, and store the visited sites as a bitfield in the &lt;code&gt;Search&lt;/code&gt; struct, I&apos;ll be faster.
If the requirements for my system drastically changes, maybe I&apos;ll get to find out.&lt;/p&gt;
&lt;p&gt;As always, any input, shorter paths, or excess flow, can be sent to &lt;a href=&quot;mailto:~mht/public-inbox@lists.sr.ht&quot;&gt;my public inbox&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-aha&quot;&gt;
&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=s6VaeFCxta8&quot;&gt;&amp;quot;Hunting High and Low&amp;quot;&lt;/a&gt; by a-ha. Other title candidates include &lt;em&gt;&amp;quot;one Flow over the cuckoo&apos;s nest&amp;quot;&lt;/em&gt; and &lt;a href=&quot;https://www.youtube.com/watch?v=CxKWTzr-k6s&quot;&gt;&amp;quot;Even Flow&amp;quot;&lt;/a&gt;. &lt;a href=&quot;#user-content-fnref-aha&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-layer&quot;&gt;
&lt;p&gt;I tried to avoid naming that could be associated to neural networks, but here I came short. &lt;a href=&quot;#user-content-fnref-layer&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-int-flow&quot;&gt;
&lt;p&gt;Something something subgraph induced by fractional flow edges, and cancel loops? Something no negative cycles in residual graph (assumption by optimality)? &lt;a href=&quot;#user-content-fnref-int-flow&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-imp&quot;&gt;
&lt;p&gt;This is a pattern I feel like I keep seeing when looking at &amp;quot;good&amp;quot; code; algorithms and data structures are very often used in an &amp;quot;abstract&amp;quot; sense, as opposed to directly implemented in the code. &lt;a href=&quot;#user-content-fnref-imp&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-ce&quot;&gt;
&lt;p&gt;This is a chicken-and-egg problem, because you need both of these. You both want to know what guarantees a method provides (in this case, optimality of the solution), but you also want to make sure that the method is implementable and has the right characteristics (performance, usability, maintainability) for what you&apos;re trying to do. Looking back it seemed risky that I started writing code without knowing if what I was trying to do would even lead me to a correct solution, but on the other hand, sitting down trying to prove the correctness of an hypothetical implementation. &lt;a href=&quot;#user-content-fnref-ce&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-float-ord&quot;&gt;
&lt;p&gt;While I understand why &lt;code&gt;f{32,64}&lt;/code&gt; aren&apos;t &lt;code&gt;Ord&lt;/code&gt;, this is so annoying to always have to go around that I can&apos;t imagine this being the best choice. I wish the ordering was consistently defined to be &lt;code&gt;NaN&lt;/code&gt; at either end of the ordering. Maybe there are hairly details I&apos;m not thinking about though. &lt;a href=&quot;#user-content-fnref-float-ord&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-ord-reverse&quot;&gt;
&lt;p&gt;One alternative to doing this is to implement &lt;code&gt;PartialOrd&lt;/code&gt; and &lt;code&gt;Ord&lt;/code&gt; yourself, and &lt;code&gt;.reverse&lt;/code&gt; the ordering there. I ended up doing this later. &lt;a href=&quot;#user-content-fnref-ord-reverse&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-inst&quot;&gt;
&lt;p&gt;More often than not, we&apos;re not dealing with the full-general version of these computational problems, and sometimes there&apos;s significant savings when we only solve the actual problem we have. &lt;a href=&quot;#user-content-fnref-inst&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-madeup&quot;&gt;
&lt;p&gt;The numbers here are completely made up, and I did not spend any time checking if they made sense. &lt;a href=&quot;#user-content-fnref-madeup&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-reasonable&quot;&gt;
&lt;p&gt;Again, this was domain specific knowledge I had about the problem that I was trying to solve. &lt;a href=&quot;#user-content-fnref-reasonable&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-legal&quot;&gt;
&lt;p&gt;Each entry in &lt;code&gt;routing_table&lt;/code&gt; corresponds to one initial move. Off-diagonal entry $S_{i,j}$ means go from the source to site $i$ and route to site $j$ around whichever point was the best; this is legal iff site $i$ has capacity for another path. Diagonal entry $S_{i,i}$ is legal iff site $i$ has capacity. &lt;a href=&quot;#user-content-fnref-legal&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-tight&quot;&gt;
&lt;p&gt;You can&apos;t use the edge of cost $m$ more than once, since one site can only appear in a path once. However, there could be multiple entries in the table of cost $m$. You could get a tighter bound by looking at the $|S|$ cheapest entries, but this would also not be very tight, depending on which sites your path already contains. It gets complicated. &lt;a href=&quot;#user-content-fnref-tight&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-business&quot;&gt;
&lt;p&gt;A &lt;em&gt;business requirement&lt;/em&gt; if you will. &lt;a href=&quot;#user-content-fnref-business&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-problem&quot;&gt;
&lt;p&gt;&lt;em&gt;&amp;quot;problem&amp;quot;&lt;/em&gt; in this specific CS-y narrow sense. The world is big and complicated, and contains plenty of hard problems. &lt;a href=&quot;#user-content-fnref-problem&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-workout&quot;&gt;
&lt;p&gt;Sometimes this manifests really clearly. I will have tried to write something --- an API, a function, a class, a library, anything --- but it&apos;s awkward to use right, often has off-by-one errors, weird bugs, and somehow things are never in the right place. Then, I write it again, changing for instance what I store in some state, and this time, everything just falls out naturally. Off by one errors suddenly can&apos;t exist any more, things are always conveniently where they need to be, and everything runs smoothly. &lt;a href=&quot;#user-content-fnref-workout&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-speedup&quot;&gt;
&lt;p&gt;This is not to say that these aren&apos;t the reasons for similar speedups in other circumstances; these are often the culprits. But sometimes, there just exist code that is orders of magnitude better. &lt;a href=&quot;#user-content-fnref-speedup&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-wed&quot;&gt;
&lt;p&gt;This was the time of the first commit, but I don&apos;t remember when on Wednesday I started this. I also wasn&apos;t exclusively working on this during these days, so it&apos;s hard to get a time estimate with an hour granularity. &lt;a href=&quot;#user-content-fnref-wed&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-anytricks&quot;&gt;
&lt;p&gt;I wanted to write &amp;quot;pull-every-trick&amp;quot;, but this simply isn&apos;t true. They do, however, pull &lt;em&gt;some&lt;/em&gt; tricks. &lt;a href=&quot;#user-content-fnref-anytricks&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>A Neat Approximation Algorithm</title><id>https://mht.wtf/post/min-deg-st/</id><updated>2021-12-05T16:21:19+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/min-deg-st/" rel=""/><link href="https://mht.wtf/post/min-deg-st/index.html" rel="alternate"/><published>2021-12-05T16:21:19+01:00</published><content type="text/html">&lt;p&gt;&lt;em&gt;The algorithm and notation is based on section 9.3 of Williamson and Shmoys &amp;quot;The Design of Approximation Algorithms&amp;quot;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Let $G=(V,E)$ be a graph. A &lt;em&gt;spanning tree&lt;/em&gt; $T$ of $G$ is a sub graph $T=(V, E&apos;)$ such that $T$ is connected and acyclic.
Visually, you can think of it as a network that touches all vertices and doesn&apos;t contain and loops.
Computing a spanning tree, even a minimal one if we have weights on the edges that we want to reduce, is easy&lt;sup&gt;&lt;a href=&quot;#user-content-fn-st&quot; id=&quot;user-content-fnref-st&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Maybe we would like to ensure that no vertex is overloaded by having too many edges adjacent to it in $T$.
We can look for spanning trees such that the maximum degree $\Delta(T)$ is bounded by some input number $k$.
Is this problem difficult?&lt;/p&gt;
&lt;p&gt;Yes, there is no polynomial algorithm that solves this in the general case&lt;sup&gt;&lt;a href=&quot;#user-content-fn-pnp&quot; id=&quot;user-content-fnref-pnp&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.
Consider what happens if we let $k=2$: now we are asking whether there is a Hamiltonian path in $G$, which is a well known NP-hard problem.
To see why this is the same, note that
if $\Delta = 2$ then the spanning tree can never branch out, since every vertex has at most one input and one output, and since it is connected we touch every vertex.
Thus we get a simple path that touches every vertex exactly once --- a Hamiltonian path.&lt;/p&gt;
&lt;p&gt;Okay, so we can&apos;t solve it exactly in polynomial time, but can we approximate it? It turns out yes, and with a surprising bound.
Let $\text{OPT}$ denote the minimal maximum&lt;sup&gt;&lt;a href=&quot;#user-content-fn-minmax&quot; id=&quot;user-content-fnref-minmax&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; vertex degree in a spanning tree in a given graph.
There is a polynomial algorithm that finds a spanning tree $T$ such that $\Delta(T)\leq\text{OPT}+1$.
That is, it will either output a spanning tree that is optimal, or its maximum degree will be $1$ above the optimal.&lt;/p&gt;
&lt;p&gt;We&apos;ll start off by stating a condition to have a tree with the bound above, then describe the algorithm, and then prove some claims.&lt;/p&gt;
&lt;h2&gt;Optimality Condition&lt;/h2&gt;
&lt;p&gt;First some notation. We are given a graph $G$ and consider a spanning tree $T$.
Let $k=\Delta(T)$, $D_k$ be a non-empty set of vertices of degree $k$ in $T$, and $D_{k-1}$ be any set of vertices of degree $k-1$:&lt;/p&gt;
&lt;p&gt;$$D_k \subseteq \{v\in V \mid d_T(v) = k \},\quad D_k\neq \emptyset$$
$$D_{k-1} \subseteq \{v\in V \mid d_T(v) = k-1 \}$$&lt;/p&gt;
&lt;p&gt;$D_k$ are all of the bad vertices since they have the highest degree, and we want low degree vertices.
$D_{k-1}$ are fine for now, but we need to be careful with them. We can&apos;t add any edges to these vertices,
since then they&apos;d be bad too.&lt;/p&gt;
&lt;p&gt;Next, let $F$ be all edges in $T$ that touches either $D_k$ or $D_{k-1}$ (or both).
These are the bad edges that we want to get rid of, because they are adjacent to the overloaded vertices in $D_k$ and $D_{k-1}$.&lt;/p&gt;
&lt;p&gt;Last, let $C$ denote all connected components in $T$ if we were to remove all edges in $F$ from $T$.
Note that we have exactly $|F|+1$ connected components: we start off by one component, since $T$ is connected,
and for every edge we remove we split a component into two.&lt;/p&gt;
&lt;p&gt;Here comes the condition:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If each edge in $G$ that connects distinct components in $C$ has at
least one endpoint in $C_k\cup C_{k-1}$ then $\Delta(T)\leq\text{OPT}+1$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A proof is at the bottom of the post (it is not very involved). This condition is almost all we need.&lt;/p&gt;
&lt;h2&gt;The High Level Picture&lt;/h2&gt;
&lt;p&gt;First a note on spanning trees. Since they span the entire graph and are acyclic,
if we add a single edge to a spanning tree we will get a cycle&lt;sup&gt;&lt;a href=&quot;#user-content-fn-cycle&quot; id=&quot;user-content-fnref-cycle&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.
Further, we can then remove &lt;em&gt;any&lt;/em&gt; edge in the cycle and still have a spanning tree.
At a high level, the operation we will do is exactly this: insert an edge adjacent to good vertices
to make a cycle that involves some of the bad vertices, and then
remove an edge that is adjacent to a bad vertex, making the total badness less.&lt;/p&gt;
&lt;p&gt;Let&apos;s have some figures&lt;sup&gt;&lt;a href=&quot;#user-content-fn-fig&quot; id=&quot;user-content-fnref-fig&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; to make things a little more concrete.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;graph.png&quot; alt=&quot;Illustration of an example graph.&quot; /&gt;
&lt;img src=&quot;spanning-tree.png&quot; alt=&quot;A spanning tree is highlighted in the example graph.&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Say this is the graph $G$ with some spanning tree $T$ marked with bold edges.
In the spanning tree, $a$ and $f$ are the bad vertices since $d_T(f)=4$ and $d_T(a)=3$.
This means that we can choose&lt;sup&gt;&lt;a href=&quot;#user-content-fn-choosedk&quot; id=&quot;user-content-fnref-choosedk&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; $D_k=\{f\}$ and $D_{k-1}=\{a\}$.
$F$ is all edges touching $a$ and $f$.
Removing $F$ from the graph leaves us with the following graph, still with the edges from the spanning tree highlighted.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;f.png&quot; alt=&quot;The edges from F are removed from the picture&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In this example, $C$ contains 8 components&lt;sup&gt;&lt;a href=&quot;#user-content-fn-c&quot; id=&quot;user-content-fnref-c&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;:
$C=\{ \{a\}, \{b\}, \{c, d\}, \{e\}, \{f\}, \{g, j, k\}, \{h\}, \{i\} \}$.
Here the optimality condition above does not hold, since there are plenty of edges that connects components in $C$ that
doesn&apos;t touch either $a$ or $f$; in fact, it just so happens that all the edges that are not spanning edges
(i.e. bold) are connecting distinct components. This needs not be the case: imagine $(g,j)$ as an edge.&lt;/p&gt;
&lt;p&gt;The idea of the algorithm is to see that the component $\{c,d\}$ and $\{g,j,k\}$ and be connected through the edge $(d,g)$&lt;sup&gt;&lt;a href=&quot;#user-content-fn-cg&quot; id=&quot;user-content-fnref-cg&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;8&lt;/a&gt;&lt;/sup&gt; (in blue below),
and that this opens up the possibility of removing either $(b,f)$ or $(f,j)$ (in red below), which reduces the degree of $f$.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;st-added.png&quot; alt=&quot;The edges from F are removed from the picture&quot; /&gt;
&lt;img src=&quot;st-improved.png&quot; alt=&quot;The edges from F are removed from the picture&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The resulting tree still spans the graph, and
now we have reduced the maximum degree in the tree from $4$ to $3$.
You can imagine doing the same trick with replacing $(a,b)$ in favor of $(b,e)$ and $(f,i)$ in favor of $(i,j)$.
The resulting spanning tree would be optimal since its maximum degree is 2.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;st-opt.png&quot; alt=&quot;The optimal spanning tree, which is also an Hamiltonian path.&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;The Algorithm&lt;/h2&gt;
&lt;p&gt;The algorithm iterates until the optimality condition above is true.
Let $k=\Delta(T)$ be the maximal degree of the spanning tree.
In each step we aim to reduce one vertex of degree $k$ to $k-1$.
If we reduce all such vertices we&apos;ll set $k=k-1$ and continue, aiming to lower the new $k$ even further.
Initialize $D_k$ and $D_{k-1}$ to be all&lt;sup&gt;&lt;a href=&quot;#user-content-fn-dk&quot; id=&quot;user-content-fnref-dk&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;9&lt;/a&gt;&lt;/sup&gt; vertices of degree $k$ and $k-1$ respectively in $T$, and compute $F$ and $C$ as defined above.
Find an edge that connects distinct components of $C$. If no such edge exist, we have our optimality condition.
Let $e=(u,v)$ be this edge, and look at the cycle we get when adding $e$ to $T$.&lt;/p&gt;
&lt;h3&gt;Case 1&lt;/h3&gt;
&lt;p&gt;If the cycle does not contain a vertex $x\in D_k$, we &lt;em&gt;don&apos;t&lt;/em&gt; do the swap.
Instead we make a note that if we want to reduce any of the vertices in the cycle that is also in $D_{k-1}$ we can do so though $e$.
Further, we&apos;ll remove all of these vertices from $D_{k-1}$, and update $F$ and $C$ accordingly.
Effectively what this removal does is to merge all&lt;sup&gt;&lt;a href=&quot;#user-content-fn-all&quot; id=&quot;user-content-fnref-all&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; components that were connected to the cycle.&lt;/p&gt;
&lt;p&gt;We have not done any real work yet, but we have made the set $D_k\cup D_{k-1}$ smaller, so maybe next time we&apos;ll get lucky.
The downside is that we can no longer trust $D_{k-1}$ to contain all vertices of degree $k-1$.
We removed the vertices from the sets, but we didn&apos;t remove any edges from the tree $T$.
However, with the edge $e$ we have said that if we need to reduce the degree of one of these vertices, we can do so with $e$.
Let&apos;s call this edge $e_u$ for a vertex $u$.&lt;/p&gt;
&lt;h3&gt;Case 2&lt;/h3&gt;
&lt;p&gt;If the cycle &lt;em&gt;does&lt;/em&gt; contain a vertex $x\in D_k$, we want to add $e$ to $T$ and remove either of the two edges in the cycle that is adjacent to $x$.
Now, if the degree of both $u$ and $v$ is less than $k-1$ all is well, since we&apos;ve reduced the degree of $x$ from $k$ to $k-1$,
and increased the degrees of $u$ and $v$ from something to something smaller than $k-1$ to something smaller than $k$.&lt;/p&gt;
&lt;p&gt;However, if either of them is of degree $k-1$ we cannot do this. WLOG&lt;sup&gt;&lt;a href=&quot;#user-content-fn-wlog&quot; id=&quot;user-content-fnref-wlog&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;11&lt;/a&gt;&lt;/sup&gt; let $u$ be the problem vertex.
Now the setup in Case 1 pays off. Since $u\notin D_{k-1}$, but $d(u)=k-1$ we know that
we have had Case 1 with $u$, and that $u$ was in $D_{k-1}$ along a cycle, and that we removed it.
We also made the note saying that we can reduce the degree of $u$ upon request, though the edge $e_u$.
Now we can add $e_u$ and remove either of the two edges adjacent to $u$ in the cycle that was formed by adding $e_u$.
This makes $d(u)=k-2$ and we still have a spanning tree.
Then we add $e$ and remove an edge adjacent to $x$, which increases $d(u)$ back up to $k-1$, and decreases
$d(x)$ to $k-1$.
If $d(v)=k-1$ as well we do the same with it.&lt;/p&gt;
&lt;p&gt;We have successfully improved our spanning tree, since the number of vertices of degree $k$ has been reduced.
Repeat this from the start by setting $D_k$ and $D_{k-1}$ to be all vertices of degree $k$ and $k-1$ and
recomputing $F$ and $C$, until we hit the optimality condition.&lt;/p&gt;
&lt;p&gt;There is one catch though. When we are reducing $u$ we are adding in the edge $e_u$ to the tree.
How do we know that this edge will not make either of its endpoints&apos; degree too high?
The answer is that we don&apos;t! These vertices might also be of degree $k-1$, but then they, too, will have
a designated edge $e_w$ that we can add to reduce $w$.
We may end up with a chain of reductions, but this chain has to eventually terminate. See below.&lt;/p&gt;
&lt;p&gt;And that&apos;s it! Some bookkeeping, component queries and merging, and basic graph operations, and we&apos;re left
with a minimum degree spanning tree that is at most one from the optimal case, which is NP-hard.&lt;/p&gt;
&lt;h2&gt;Proofs&lt;/h2&gt;
&lt;p&gt;Here are proofs for the optimality condition and the termination and soundness of the cascading chain of reduction.&lt;/p&gt;
&lt;h3&gt;Proof of optimality condition&lt;/h3&gt;
&lt;p&gt;For brevity I will write &amp;quot;the set&amp;quot; for $D_k\cup D_{k-1}$.&lt;/p&gt;
&lt;p&gt;We first find a lower bound for $\text{OPT}$.
We have seen that $T\setminus F$ contains exactly $|F|+1$ components (this is the size of $C$).
When the condition holds this is also true for $G$, since there are no edges connecting distinct components
that doesn&apos;t also touch the set that are left in $G\setminus F$.
This implies that any spanning tree in $G$ will
need $|F|$ edges to connect them all. Furthermore, if we look at all the vertices in the set
we know that their average degree must be at least&lt;sup&gt;&lt;a href=&quot;#user-content-fn-loose&quot; id=&quot;user-content-fnref-loose&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;12&lt;/a&gt;&lt;/sup&gt; $|F| / |D_k\cup D_{k-1}|$,
since $F$ is exactly all edges touching those vertices.
The maximum has to be at least as large as the average, and so&lt;/p&gt;
&lt;p&gt;$$\left\lceil\frac{|F|}{|D_k\cup D_{k-1}|}\right\rceil \leq \text{OPT}.$$&lt;/p&gt;
&lt;p&gt;Now we find a bound on $|F|$.
If we sum up the degrees of all the vertices in our set we get $k|D_k|+(k-1)|D_{k-1}|$, since the $D$ sets are exactly vertices of that degree.
This sum can be more than the number of edges in $F$, since edges internal to the set will be counted twice.
However, we also know that $T$ is acyclic, and so there can be at most $|D_k\cup D_{k-1}|-1$ such edges&lt;sup&gt;&lt;a href=&quot;#user-content-fn-nmo&quot; id=&quot;user-content-fnref-nmo&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;13&lt;/a&gt;&lt;/sup&gt;, so we
can add up the degrees of the vertices in the set and subtract the edges internal to the set to not double count. This
gives us a bound on the number of edges touching the set&lt;sup&gt;&lt;a href=&quot;#user-content-fn-disjoint&quot; id=&quot;user-content-fnref-disjoint&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;14&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;$$\begin{align}
&amp;amp;k|D_k|+(k-1)|D_{k-1}| - \left(|D_k\cup D_{k-1}|-1\right) \\
=\ \ &amp;amp;k|D_k|+(k-1)|D_{k-1}| - |D_k| - |D_{k-1}|+1 \leq |F|
\end{align}$$&lt;/p&gt;
&lt;p&gt;Now we can combine the two inequalities by inserting the bound for $|F|$ into the bound for $\text{OPT}$:&lt;/p&gt;
&lt;p&gt;$$\begin{align}
\text{OPT} &amp;amp;\geq
\left\lceil\frac{k|D_k|+(k-1)|D_{k-1}| - |D_k| - |D_{k-1}|+1}
{|D_k| + |D_{k-1}|}\right\rceil \\
&amp;amp;= \left\lceil\frac{k(|D_k|+|D_{k-1}|) - \left(|D_k| + |D_{k-1}|\right) - |D_{k-1}|+1}
{|D_k| + |D_{k-1}|}\right\rceil \\
&amp;amp;= \left\lceil k - 1 - \frac{|D_{k-1}|-1}
{|D_k| + |D_{k-1}|}\right\rceil \\
&amp;amp;\geq k-1
\end{align}$$&lt;/p&gt;
&lt;p&gt;Recall that $k=\Delta(T)$, so $\Delta(T) \leq \text{OPT}+1$.&lt;/p&gt;
&lt;h3&gt;Proof of termination of the algorithm&lt;/h3&gt;
&lt;p&gt;Since each step is either reducing $D_k$ or $D_{k-1}$ each iteration in the algorithm will eventually terminate,
and since $k=2$ is the best possible maximal degree (unless we only have 1 or 2 vertices) we cannot reduce forever.&lt;/p&gt;
&lt;p&gt;The less obvious part of the algorithm to terminate is the reduction of a $k-1$ degree vertex. Recall that in reducing
a bad vertex $u$ with $d(u)=k$ we had to add an edge $(v,w)$ to the graph where $d(v)=k-1$,
and this was done by adding in a second edge $e_v$. The problem is that $e_v$ might again be adjacent to a vertex
of degree $k-1$, which will also have to be reduced. Now we show that this procedure will terminate.
We do so by induction on the iteration number $i$, showing that when we mark a node as reducible (Case 1) we can
perform the later reductions as well.&lt;/p&gt;
&lt;p&gt;$i=1$: In the first iteration we have not marked any nodes as reducible through some edge, and so this terminates.&lt;/p&gt;
&lt;p&gt;$i=l$: Let $u$ be the node we want to reduce, and $e=(v,w)$ the edge we reduce it with.
WLOG let $v$ be the node we need to reduce, and $j$ the iteration in which we marked $v$ as reducible.
By induction we can reduce $v$ from degree $k-1$ to $k-2$, and the same is true for $w$.
How do we know that reducing $v$ using the edge $e_v$ doesn&apos;t mess up our reduction of $u$ with $e$?
Because in marking $v$ as reducible we joined the two components that $e_v$ connected, and so
in iteration $i$, the edge $v_e$ is internal to some component. In fact, the entire cycle formed by
adding $v_e$ to $T$ is in that component. Since $C$ is not affected by the reduction of $v$, the reduction
of $u$ via $e$ is still valid.&lt;/p&gt;
&lt;p&gt;The only thing left to consider is that there is no chain of reductions that all add an edge to some vertex $x$,
so that $k \leq d(x)$ after all reductions are done.
This cannot happen: in fact, in a single iteration the set of edges that we add to the tree when reducing
vertices are pairwise disjoint,
and so any vertex will at most have its degree bumped once.
Since the vertices of degree $k-1$ are already taken care of, no other vertex will suddenly have degree $k$,
and so we are guaranteed progress.&lt;/p&gt;
&lt;p&gt;To show disjointedness, let $u$ be reduced by the edge $e = (v, w)$, and assume that $v$ also needs to be reduced.
Recall that, by definition, when we decided that $v$ is reducible through
edge $e_v=(x,y)$, $e_v$ connected two different components.  Further, we merged
the components of the cycle that was formed by adding the edge $e_v$ into
a bigger component $C_v$, which contains both the two components that $e_v$
connected and $v$ itself. This is one of the components that $e$ connects,
and it does so through $v$, so we know that $e$ and $e_v$ are disjoint.
Now, if either $x$ or $y$ also needs to be reduced, say $x$, that will be through
$e_x$, which will, by the same logic, be internal to the component $C_x$, which does not contain
neither $u$, $v$, or $w$, so $e_x$ is also disjoint from both $e$ and $e_v$&lt;sup&gt;&lt;a href=&quot;#user-content-fn-pf&quot; id=&quot;user-content-fnref-pf&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;15&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;We&apos;ve seen an algorithm that finds a spanning tree that minimizes the maximum degree of the tree up to an error of 1.
The original paper where this was taken from, &lt;a href=&quot;https://blogs.asarkar.com/assets/docs/algorithms-curated/Approximating%20the%20Minimum%20Degree%20Spanning%20Tree%20-%20Furer.pdf&quot;&gt;link here&lt;/a&gt;,
is called
&lt;em&gt;&amp;quot;Approximating the minimum degree spanning tree to within one from the optimal degree&amp;quot;&lt;/em&gt; by Fürer and Raghavachari.
I didn&apos;t actually read the paper, just the book chapter mentioned in the beginning.&lt;/p&gt;
&lt;p&gt;I thought the algorithm was especially neat due to the bound, since approximation algorithms often are multiplicative
factors off the optimal solution, for instance with a factor of 2 or 1.5.
It is interesting that finding a spanning tree with maximal degree 2 is NP-hard, but
that there is a polynomial algorithm that will find a spanning tree of maximal degree of 2 or 3 if
the graph is Hamiltonian.&lt;/p&gt;
&lt;p&gt;I haven&apos;t tried to implement this, but from what I can tell it should not be too difficult.
It would be interesting to see some statistics on the performance of this algorithm
on a corpus of graphs that are known to be Hamiltonian.
Things like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How long do the reduction chains become?&lt;/li&gt;
&lt;li&gt;How many iterations are required before we reach optimality?&lt;/li&gt;
&lt;li&gt;How often do we get the optimal tree?&lt;/li&gt;
&lt;li&gt;How does $|F|$ change during the lifetime of the algorithm?&lt;/li&gt;
&lt;li&gt;How does $C$ change during the lifetime of the algorithm?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and probably many other thing.
If you know of any work on this or have any ideas yourself, feel free
to send it to my &lt;a href=&quot;mailto:https://lists.sr.ht/~mht/public-inbox&quot;&gt;public inbox&lt;/a&gt; (plain text emails only).&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-st&quot;&gt;
&lt;p&gt;For instance with &lt;a href=&quot;https://en.wikipedia.org/wiki/Prim%27s_algorithm&quot;&gt;Prims&lt;/a&gt; or &lt;a href=&quot;https://en.wikipedia.org/wiki/Kruskal%27s_algorithm&quot;&gt;Kruskals&lt;/a&gt; algorithm. &lt;a href=&quot;#user-content-fnref-st&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-pnp&quot;&gt;
&lt;p&gt;Unless $P=NP$, that is. &lt;a href=&quot;#user-content-fnref-pnp&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-minmax&quot;&gt;
&lt;p&gt;That is, among all possible spanning tree, we are looking for the one where the maximal degree $\Delta$ is minimized. &lt;a href=&quot;#user-content-fnref-minmax&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-cycle&quot;&gt;
&lt;p&gt;If you don&apos;t believe me, draw a spanning tree and try to insert an edge between any two vertices. &lt;a href=&quot;#user-content-fnref-cycle&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-fig&quot;&gt;
&lt;p&gt;I&apos;d be interested in knowing how people make graphs for the web in some vector format. These are &lt;code&gt;tikz&lt;/code&gt; converted to &lt;code&gt;png&lt;/code&gt;s using ImageMagick, and it&apos;s &lt;em&gt;fine&lt;/em&gt;. &lt;a href=&quot;#user-content-fnref-fig&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-choosedk&quot;&gt;
&lt;p&gt;To reach the optimality condition it is always better to have $D_k$ and $D_{k-1}$ be as large as possible, since then it touches more edges. &lt;a href=&quot;#user-content-fnref-choosedk&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-c&quot;&gt;
&lt;p&gt;Recall that we look at the components in $T$, and not in $G$. We only care about the bold edges. &lt;a href=&quot;#user-content-fnref-c&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-cg&quot;&gt;
&lt;p&gt;We could also have chosen $(c,g)$ to connect the two components. &lt;a href=&quot;#user-content-fnref-cg&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-dk&quot;&gt;
&lt;p&gt;Note that in the optimality condition we said that they could be arbitrary subsets of these vertices. Now we choose all of them. &lt;a href=&quot;#user-content-fnref-dk&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-all&quot;&gt;
&lt;p&gt;The book said that it would merge the components of the endpoints of $e$, but I cannot see how it would not join other components that are attached to the cycle as well. It is also not listed in the &lt;a href=&quot;http://www.designofapproxalgs.com/errata.pdf&quot;&gt;errata&lt;/a&gt;. &lt;a href=&quot;#user-content-fnref-all&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-wlog&quot;&gt;
&lt;p&gt;&amp;quot;&lt;a href=&quot;https://en.wikipedia.org/wiki/Without_loss_of_generality&quot;&gt;Without loss of generality&lt;/a&gt;&amp;quot;: if it was in fact $v$ and not $u$ you can just mentally swap them in the text. &lt;a href=&quot;#user-content-fnref-wlog&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-loose&quot;&gt;
&lt;p&gt;The &amp;quot;at least&amp;quot; comes from the fact that there might be edges internal to the set, i.e. connecting two high degree vertices. In that case we would have to count the edge twice to get the actual average. The proof doesn&apos;t need it though. &lt;a href=&quot;#user-content-fnref-loose&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-nmo&quot;&gt;
&lt;p&gt;This is just like how there are at most $n-1$ edges in a $n$ vertex graph without cycles. &lt;a href=&quot;#user-content-fnref-nmo&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-disjoint&quot;&gt;
&lt;p&gt;The equality holds since the two sets are disjoint. &lt;a href=&quot;#user-content-fnref-disjoint&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-pf&quot;&gt;
&lt;p&gt;The logic here kind of follows from the $i=l$ step of the induction; I&apos;m sure there&apos;s a nicer way of phrasing it though. &lt;a href=&quot;#user-content-fnref-pf&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>Ark</title><id>https://mht.wtf/post/ark/</id><updated>2026-02-23T21:34:35+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/ark/" rel=""/><link href="https://mht.wtf/post/ark/index.html" rel="alternate"/><published>2026-02-23T21:34:35+01:00</published><content type="text/html">&lt;p&gt;&lt;code&gt;ark&lt;/code&gt; is my latest personal service, after my own &lt;a href=&quot;https://mht.wtf/post/rss/&quot;&gt;rss&lt;/a&gt; reader.
The stack is the same as the other services, and it all lives in the same repo.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ark&lt;/code&gt; stores markdown documents, and that&apos;s basically it.
It also uses &lt;a href=&quot;https://mht.wtf/post/rss/&quot;&gt;&lt;code&gt;qdrant&lt;/code&gt;&lt;/a&gt; for embeddings for the docs
so that I can do similarity search.
The embeddings come from OpenAI&apos;s &lt;code&gt;text-embedding-3-small&lt;/code&gt; model, which have cost me $0.01 so far.&lt;/p&gt;
&lt;p&gt;This is what it looks like; colors are subject to change:&lt;/p&gt;
&lt;figure style=&quot;display: flex; justify-content: center&quot;&gt;
  &lt;div style=&quot;max-width: 400px&quot;&gt;
    &lt;img src=&quot;./ark.png&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
    &lt;figcaption&gt;&lt;code&gt;ark&lt;/code&gt; shows a note together with related notes.&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;I&apos;m considering changing the separate-edit page with a heavier inline markdown editor, but it might be easier to let the cli take care of editing:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ ark
Usage: ark &amp;lt;COMMAND&amp;gt;

Commands:
  add      Create a new document [aliases: a]
  get      Get a document by ID [aliases: g]
  update   Update a document by ID [aliases: u]
  delete   Delete a document by ID [aliases: d]
  search   Search documents by meaning (or exact text with --text) [aliases: s]
  browse   Browse recent documents interactively [aliases: b]
  list     List recent documents [aliases: l]
  reindex  Reindex all documents for search
  help     Print this message or the help of the given subcommand(s)

Options:
  -h, --help  Print help
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I inserted my entire Obsidian knowledge base into &lt;code&gt;ark&lt;/code&gt; and it is about to replace Obsidian for me,
although it should be said that my usage has been very low for years.
Searching uses the embedding model for similarity search and outputs the ids and scores for the
top candidates:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ ark search &apos;integrate 0-form over manifold&apos;
[743] (0.64) ddg/Exterior Calculus.md
[594] (0.53) oaomm/manifold.md
[418] (0.51) ddg/Ex/Ex6-3.md
[631] (0.50) oaomm/immersed submanifold.md
[555] (0.48) oaomm/sphere manifold.md
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;ark&lt;/code&gt; only tracks markdown data, it doesn&apos;t track files.
These results look like filenames, but this is a quirk of the Obsidian import:
I had previously used the filenames of the obsidian files as relevant metadata
so that the content itself wouldn&apos;t necessarily make sense on it&apos;s own.
To keep it, I prepended the filename as the first line of every file.&lt;/p&gt;
&lt;p&gt;You can then get the markdown from an id:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ ark get 743
ddg/Exterior Calculus.md

$\def\RR{\mathbb{R}}$Let $f,g$ be 0-forms and $M$ some $n$-dimensional manifold. For each point $p \in M$, the form $f$ assigns a number $f_p\in\RR^n$.

We can integrate a $0$-form over the manifold, like so: $\int_M f$. By this we mean the sum of all $f_p$ where $p\in M$.

### Differential
The differential operator $d$ takes a $k$-form to a $(k+1)$-form.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;ark browse&lt;/code&gt; basically does these two things in one interactive step.&lt;/p&gt;
&lt;p&gt;The beauty of &lt;code&gt;ark&lt;/code&gt; is that this is it.
It&apos;s &lt;em&gt;that&lt;/em&gt; simple.
But simplicity means that it&apos;s easy to integrate with.
Here&apos;s a claude skill for interacting with &lt;code&gt;ark&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-md&quot;&gt;---
name: ark
description: CLI tool to handle personal markdown documents.
---

`ark --help` shows the help page.  Responses are in json.
`ark search TERM` uses vector embedded similarity search.
`ark search --text TERM` uses exact search.

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now I can ask claude for how to make pizza the way I want it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ claude &apos;use /ark to find what I need for an italian dinner&apos; --allowed-tools=&amp;quot;Bash(ark:*)&amp;quot; -p
Here&apos;s what I found for your Italian dinner:

**Pizza Recipe** (for two pizzas):

Dough:
- 200g water
- 7g salt
- 1.0g dry yeast
- 275g flour

Sauce:
- Canned Italian tomatoes (from IMS)
- Dry oregano
- Squeeze and cut large pieces, don&apos;t put on too much, spread it out to the crust

**Wine pairing:**
- Wongraven Alleanza Langhe Rosso 2024 (your favorite!)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a silly example because the llm didn&apos;t do anything useful in this case,
but it&apos;s using &lt;em&gt;my data&lt;/em&gt; to help &lt;em&gt;me&lt;/em&gt;.
The data comes from two notes, one for a pizza recipe and one for that wine,
which I put in as a test.
This is my pizza recipe, but the wine isn&apos;t my favorite.&lt;/p&gt;
&lt;p&gt;What I like the most about &lt;code&gt;ark&lt;/code&gt; is that
I have access to my entire set of notes from anywhere,
and can trivially get other tools to read it and do useful things with it;
my data isn&apos;t held hostage by a SaaS.
It&apos;s also very much unstructured, and I hope to be able to keep it that way.
Leaning on other tools for search feels more appropriate for my use case
than having to structure files in a tree or manually annotate with tags.&lt;/p&gt;
</content></entry><entry><title>Advent of Common Lisp, Day 1-4</title><id>https://mht.wtf/post/advent-2018-1/</id><updated>2018-12-01T13:46:06+01:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/advent-2018-1/" rel=""/><link href="https://mht.wtf/post/advent-2018-1/index.html" rel="alternate"/><published>2018-12-01T13:46:06+01:00</published><content type="text/html">&lt;p&gt;The only exposure that I have to Common Lisp is that I wrote about 1000 lines
of it about 4 years ago. Since I don&apos;t have any excuse to write CL day-to-day,
the days since I last typed &lt;code&gt;defun&lt;/code&gt; seems to have added up. Luckily, the
&lt;a href=&quot;https://adventofcode.com/2018/&quot;&gt;Advent of Code&lt;/a&gt; is upon us, which is a great
way of learning a new language or brushing dust of old skills of a language you
once knew; I&apos;m taking the opportunity to finally write me some Common Lisp.&lt;/p&gt;
&lt;h2&gt;Common Lisp, Emacs, Slime, and QuickLisp&lt;/h2&gt;
&lt;p&gt;People seem to say that &lt;em&gt;the&lt;/em&gt; way of writing CL is in Emacs using Slime; I am a
long going &lt;code&gt;vim&lt;/code&gt; addict, but I have spend the last few months in Spacemacs, in
order to see what I&apos;ve been missing out on, so being pressured into using Emacs
isn&apos;t all that bad.&lt;/p&gt;
&lt;p&gt;I&apos;m still not sure exactly what Slime is, but it seems to be something that
allows me to write code in emacs, and send it to a Lisp process, which sounds
useful enough. Oh, and it also has a debugger which, though a little difficult
to use, looks promising. Slime is installed using &lt;code&gt;package-install&lt;/code&gt;, like most
other things in the emacs world.&lt;/p&gt;
&lt;h3&gt;Installing QuickLisp&lt;/h3&gt;
&lt;p&gt;QuickLisp is a library manager for Common Lisp, and it comes in handy when we
want to do something that the standard library doesn&apos;t offer but that we don&apos;t
want to write ourselves.  Installing &lt;code&gt;quicklisp&lt;/code&gt; is rather easy, and the
process is pretty much described on its website. We download a file
&lt;code&gt;quicklisp.lisp&lt;/code&gt;, load it with &lt;code&gt;sbcl --load &amp;lt;path-to-file&amp;gt;&lt;/code&gt;, and that&apos;s it.
Now all we must do is evaluate &lt;code&gt;(load &amp;quot;~/quicklisp/setup.lisp&amp;quot;)&lt;/code&gt; in Lisp, and
we&apos;re ready to go.&lt;/p&gt;
&lt;h3&gt;Reading Input&lt;/h3&gt;
&lt;p&gt;We will probably read input from a file every day, so having a function
that returns a list of strings, one for each line, makes sense.
&lt;code&gt;uiop&lt;/code&gt; is a library that comes with &lt;code&gt;asdf&lt;/code&gt; and contains the function
&lt;code&gt;uiop:read-file-lines&lt;/code&gt; which does exactly this. This is the function
we will be using, if nothing else is mentioned.&lt;/p&gt;
&lt;h2&gt;Day 1&lt;/h2&gt;
&lt;h3&gt;Part 1&lt;/h3&gt;
&lt;p&gt;The first challenge was simple enough: sum a list of numbers.
This is straight forward in any Lisp, provided you remember whether
the function is called &lt;code&gt;fold&lt;/code&gt; or &lt;code&gt;reduce&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defparameter *input-1* (mapcar #&apos;parse-integer (uiop:read-file-lines &amp;quot;1.input&amp;quot;)))
(defun day-1/1 (numbers)
  (reduce #&apos;+ numbers))
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Part 2&lt;/h3&gt;
&lt;p&gt;The second part is slightly worse: we are asked to keep track of all partial
sums through the list and see what sum we get twice first. In addition, if no
collisions are found throughout the first iteration of the list, we should
restart, while keeping the accumulated sum.&lt;/p&gt;
&lt;p&gt;I first attempted using the &lt;code&gt;loop&lt;/code&gt; macro:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-1/2-loop (numbers)
  (loop for n in (cons 0 numbers)
        summing n into freq
        when (find n seen)
          return freq
        append (list freq) into seen))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One quirk with this attempt is that &lt;code&gt;append&lt;/code&gt; seemingly want a list as its first
argument, and not the element you are appending --- we&apos;re really just joining
two lists --- so we construct the first list explicitly.  A worse thing about
this is that this function only runs through the list once.  After spending 15
minutes looking at tutorials, cookbooks, and other documentation, looking for a
way to just repeat the &lt;code&gt;for&lt;/code&gt; loop if we exhaust the list, I guessed that it&apos;s
not possible using &lt;code&gt;loop&lt;/code&gt;, so I rewrote it using a much worse looking recursive
function:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-1/2-list (numbers)
  (labels ((inner (numbers current freq seen)
                 (if (current)
                     (if (find freq seen)
                         freq
                         (inner numbers 
                                (cdr current) 
                                (+ freq (car current)) 
                                (cons freq seen)))
                     (inner numbers numbers freq seen))))
  (inner numbers numbers 0 nil)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Since we&apos;re checking through the &lt;code&gt;seen&lt;/code&gt; list in each call, this is has squared complexity.
Looking at the runtime, it shows:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(time (day-1/2-list *input-1*))
Evaluation took:
  54.224 seconds of real time
  54.176661 seconds of total run time (54.173330 user, 0.003331 system)
  99.91% CPU
  157,466,185,510 processor cycles
  2,162,688 bytes consed
  
219
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One option of improving this is to use a hash table instead of a list for &lt;code&gt;seen&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-1/2-ht (numbers)
  (let ((seen (make-hash-table)))
    (labels ((inner (numbers current freq)
                    (if current
                        (if (gethash freq seen)
                            freq
                            (progn (setf (gethash freq seen) t)
                                (inner numbers (cdr current) (+ freq (car current)))))
                        (inner numbers numbers freq))))
      (inner numbers numbers 0))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As one would expect the running time is much better now:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(time (day-1/2-ht *input-1*))
Evaluation took:
  0.026 seconds of real time
  0.025140 seconds of total run time (0.025136 user, 0.000004 system)
  [ Run times consist of 0.007 seconds GC time, and 0.019 seconds non-GC time. ]
  96.15% CPU
  73,008,592 processor cycles
  20,931,664 bytes consed
  
219
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Day 2&lt;/h2&gt;
&lt;h3&gt;Part 1&lt;/h3&gt;
&lt;p&gt;The task of the second day amounts to checking whether a string contains exactly
two or exactly three of any character.
Since we&apos;ve seen that list processing in Lisp can be quite slow, I want to go
for a more traditional solution:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Turn the &lt;code&gt;String&lt;/code&gt; into an &lt;code&gt;Array&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Sort the &lt;code&gt;Array&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Loop through and count the length of equal character runs.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As it turns out, &lt;code&gt;String&lt;/code&gt;s in Common Lisp are already &lt;code&gt;Arrays&lt;/code&gt;: off to a good start.
Next we want to sort it. Running &lt;code&gt;(describe #&apos;sort)&lt;/code&gt; tells me the following:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (describe #&apos;sort)
#&amp;lt;FUNCTION SORT&amp;gt;
  [compiled function]


Lambda-list: (SEQUENCE SB-IMPL::PREDICATE &amp;amp;REST SB-IMPL::ARGS &amp;amp;KEY
              SB-IMPL::KEY)
Dynamic-extent arguments: positional=(1), keyword=(:KEY)
Declared type: (FUNCTION
                (SEQUENCE (OR FUNCTION SYMBOL) &amp;amp;REST T &amp;amp;KEY
                 (:KEY (OR FUNCTION SYMBOL)))
                (VALUES SEQUENCE &amp;amp;OPTIONAL))
Documentation:
  Destructively sort SEQUENCE. PREDICATE should return non-NIL if
     ARG1 is to precede ARG2.
Inline proclamation: MAYBE-INLINE (inline expansion available)
Known attributes: call
Source file: SYS:SRC;CODE;SORT.LISP
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are a few things to note here. First off, we need to pass a &lt;code&gt;predicate&lt;/code&gt;,
since &lt;code&gt;sort&lt;/code&gt; doesn&apos;t know the types of the values that we want to sort, so we
need to find a character comparing function.  In addition, &lt;code&gt;sort&lt;/code&gt;
&lt;em&gt;destructively&lt;/em&gt; sorts the sequence; this should be fine (even preferable), but
we need to take that into account.
Browsing &lt;code&gt;lispcookbook&lt;/code&gt; we find an example using a function &lt;code&gt;char=&lt;/code&gt;,
so we guess there is a function &lt;code&gt;char&amp;lt;&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (sort &amp;quot;hello world&amp;quot; #&apos;char&amp;lt;)
&amp;quot; dehllloorw&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Great!
Now that we have a sorted &lt;code&gt;Array&lt;/code&gt; of the characters we loop through the &lt;code&gt;Array&lt;/code&gt;
and increment a counter if there is exactly two or three equal characters.
Something like this should work:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun count-runs-2-3 (string)
  (let ((arr (sort string #&apos;char&amp;lt;))
        (2-count 0)
        (3-count 0)
        (prev-char #\NULL)
        (curr-count 0))
    (loop for c across arr
          if (char= prev-char c) do (incf curr-count)
          else do (progn
                    (case curr-count
                      (2 (incf 2-count))
                      (3 (incf 3-count))
                      (otherwise))
                    (setf prev-char c)
                    (setf curr-count 1)))
    (case curr-count    ; Don&apos;t forget adding the last run
      (2 (incf 2-count))
      (3 (incf 3-count))
      (otherwise))
    (list 2-count 3-count)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can loop through each line in the input file and sum to two counters,
one for 2 runs, and one for 3 runs. However, if there are multiple runs, they
should only count as one. While this could have been done in &lt;code&gt;count-runs-2-3&lt;/code&gt;,
we might as well make &lt;code&gt;1-if-pos&lt;/code&gt;, and handle it in the summing.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun 1-if-pos (x) (if (&amp;lt; 0 x) 1 0))

(defun day-2/1 (input)
  (loop for line in input
        for tuple = (count-runs-2-3 line)
        summing (1-if-pos (first tuple)) into 2-sum
        summing (1-if-pos (second tuple)) into 3-sum
        finally (return (* 2-sum 3-sum))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This solves part 1.&lt;/p&gt;
&lt;h3&gt;Part 2&lt;/h3&gt;
&lt;p&gt;In the second part we are asked to find a pair of strings in the input that
differs by exactly one character. By this time I realize that the destructive
sorting has messed up my input variables:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;*test-input-2*
(&amp;quot;abcdef&amp;quot; &amp;quot;aabbbc&amp;quot; &amp;quot;abbcde&amp;quot; &amp;quot;abcccd&amp;quot; &amp;quot;aabcdd&amp;quot; &amp;quot;abcdee&amp;quot; &amp;quot;aaabbb&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Oops! Instead of fixing this (eg. by cloning the strings before sorting,
or find out whether &lt;code&gt;sort&lt;/code&gt; offers an option to be non-destructive) I&apos;ll just
leave it as is, and read the input file again.&lt;/p&gt;
&lt;p&gt;In any case, there&apos;s a few different ways we can do part 2. The simplest is
just to check all pairs, calculate the difference, and output the pair if the
difference is two.&lt;/p&gt;
&lt;p&gt;First we need to find all pairs of elements in a list. Again, after looking at
&lt;code&gt;loop&lt;/code&gt; &lt;code&gt;for&lt;/code&gt; a &lt;code&gt;while&lt;/code&gt; I couldn&apos;t find anything useful (discoverability is
hard!), so I decided to roll my own:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun all-pairs (list)
  (if list
      (let ((head (car list))
            (rest (cdr list)))
        (append (mapcar (lambda (e) (list head e)) rest)
                (all-pairs rest)))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, this isn&apos;t &lt;em&gt;quite&lt;/em&gt; correct: &lt;code&gt;(all-pairs &apos;(1))&lt;/code&gt; returns &lt;code&gt;NIL&lt;/code&gt;, but with
exception of this case the function seems to do the trick. Next we need to
count the number of different chars in a pair. Again we&apos;re doing the simplest
thing possible:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun count-difference (first second)
  (loop for i from 0 below (length first)
        for a = (char first i)
        for b = (char second i)
        counting (not (char= a b)) into diffs
        finally (return diffs)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can find the two strings that differ in exactly one position.
However, the task asks us to find the portion of the two strings
that are the same, and not the two strings themselves, so we need yet another
function:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun remove-equals (first second)
  (with-output-to-string (out)
    (loop for i from 0 below (length first)
          for a = (char first i)
          for b = (char second i)
          when (char= a b) do (write-char a out))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now the final function for today&apos;s task is done:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-2/2 (input)
  (loop for (a b) in (all-pairs input)
        when (eq (count-difference a b) 1)
        do (return (remove-equals a b))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I figured that since we have consequently done the simplest, and probably the
least efficient things, running the function on the input would take some time:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(time (day-2/2 *input-2*))
Evaluation took:
  0.012 seconds of real time
  0.011565 seconds of total run time (0.008236 user, 0.003329 system)
  100.00% CPU
  33,790,325 processor cycles
  1,998,496 bytes consed
  
&amp;quot;mbruvapghxlzycbhmfqjonsie&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;... but apparently not. One simple optimization we could have done is early
return from &apos;count-difference&apos;, since we only care if the difference is &lt;code&gt;1&lt;/code&gt; or
not. Had the strings been very long this could have been significantly faster;
our strings are only 25 chars long, so for our input it doesn&apos;t matter much,
at least not wall clock wise:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(time (day-2/2-opt *input-2*))
Evaluation took:
  0.007 seconds of real time
  0.006663 seconds of total run time (0.000033 user, 0.006630 system)
  100.00% CPU
  19,506,926 processor cycles
  1,998,496 bytes consed
  
&amp;quot;mbruvapghxlzycbhmfqjonsie&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;~5ms less in real time, but only 66% of the CPU cycles.&lt;/p&gt;
&lt;h2&gt;Day 3&lt;/h2&gt;
&lt;h3&gt;Part 1&lt;/h3&gt;
&lt;p&gt;Day three is here, and the first task of today is to
parse lines of the format &lt;code&gt;#&amp;lt;id&amp;gt; @ &amp;lt;x&amp;gt;,&amp;lt;y&amp;gt;: &amp;lt;w&amp;gt;x&amp;lt;h&amp;gt;&lt;/code&gt;, like &lt;code&gt;#1 @ 1,3: 4x4&lt;/code&gt;.
This sounds like a &lt;code&gt;regex&lt;/code&gt; job! Which means we must figure out
how to &lt;code&gt;regex&lt;/code&gt; in Common Lisp.&lt;/p&gt;
&lt;p&gt;The Cookbook informs us that there is not support for &lt;code&gt;regex&lt;/code&gt; in the standard
library, but that packages, like &lt;code&gt;cl-ppcre&lt;/code&gt; exist. Let&apos;s try&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (ql:quickload &amp;quot;cl-ppcre&amp;quot;)
To load &amp;quot;cl-ppcre&amp;quot;:
  Load 1 ASDF system:
    asdf
  Install 1 Quicklisp release:
    cl-ppcre
; Fetching #&amp;lt;URL &amp;quot;http://beta.quicklisp.org/archive/cl-ppcre/2018-08-31/cl-ppcre-20180831-git.tgz&amp;quot;&amp;gt;
; 151.37KB
==================================================
155,003 bytes in 0.00 seconds (151370.13KB/sec)
; Loading &amp;quot;cl-ppcre&amp;quot;
[package cl-ppcre]................................
..........................
(&amp;quot;cl-ppcre&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Fancy!&lt;/p&gt;
&lt;p&gt;Ideally we would be able to write our regex with group names, match each line,
and retrieve the groups by name. Identifying groups by index is also fine.
Looking through &lt;a href=&quot;https://edicl.github.io/cl-ppcre/#do-matches-as-strings&quot;&gt;the
docs&lt;/a&gt; it seems like
&lt;code&gt;*allow-named-registers*&lt;/code&gt; is somewhat important here, so we set it to &lt;code&gt;t&lt;/code&gt; and
try &lt;code&gt;ppcre:scan&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(ppcre:scan &amp;quot;(?&amp;lt;num&amp;gt;[0-9]+)&amp;quot; &amp;quot;number is 1234 lol&amp;quot;)

10
14
#(10)
#(14)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It seems to work fine --- we&apos;re presumably getting out start and end index of
our match --- but our name &lt;code&gt;num&lt;/code&gt; is nowhere to be seen in the return values.
We are maybe meant to use something else than &lt;code&gt;scan&lt;/code&gt;, but this seems strange,
since the docs for &lt;code&gt;*allow-named-registers*&lt;/code&gt; mostly used &lt;code&gt;scan&lt;/code&gt;.
Looking further in the docs, and with a little inspiration from the cookbook
we end up with&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (ppcre:register-groups-bind (a b)
	 (&amp;quot;([0-9]+).*(lol)&amp;quot; &amp;quot;number is 1234 lolxD&amp;quot;)
   (list a b))
(&amp;quot;1234&amp;quot; &amp;quot;lol&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We didn&apos;t get to set the name in the regex itself, but this seems alright.
Now we can write our &lt;code&gt;regex&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (defun day-3/match-line (line)
  (ppcre:register-groups-bind (id x y w h)
                              (&amp;quot;#(\\d+) @ (\\d+),(\\d+): (\\d+)x(\\d+)&amp;quot; line)
                              (list id x y w h)))

* (day-3/match-line &amp;quot;#1 @ 1,3: 4x4&amp;quot;)
(&amp;quot;1&amp;quot; &amp;quot;1&amp;quot; &amp;quot;3&amp;quot; &amp;quot;4&amp;quot; &amp;quot;4&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Good! Now we&apos;re able to parse the input&lt;/p&gt;
&lt;p&gt;The actual first task of the day is to find the number of overlapping
tiles of the squares defined by the lines we just parsed.
The one solution that first comes to mind is to have a hash map
mapping coordinates to number of squares touching them.&lt;/p&gt;
&lt;p&gt;Now the plan is to parse the line into something that is easier to work with,
loop through all points in the rectangle, and insert them into a hash map.
Perhaps something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defstruct rect x y w h)
(defstruct point x y)

(defun day-3/insert-coordinates (rect hashmap)
  (loop for y from (rect-y rect) below (+ (rect-y rect) (rect-h rect))
        do (loop for x from (rect-x rect) below (+ (rect-x rect) (rect-w rect))
                 do (incf (gethash (make-point :x x :y y) hashmap)))))

(defun day-3/1 (input)
  (let ((hashmap (make-hash-table)))
    (day-3/insert-coordinates (make-rect :x 0 :y 0 :w 3 :h 3) hashmap)
		;; For now we print out the map so we can see if we succeeded or not
    (loop for key being the hash-keys of hashmap
          do (format t &amp;quot;~S -&amp;gt; ~S&amp;quot; key (gethash key hashmap)))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;... but &lt;code&gt;day-3/insert-coordinates&lt;/code&gt; isn&apos;t quite right, since we cannot &lt;code&gt;incf&lt;/code&gt;
a value when it is not present in the map. For this we try to write a new function:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun inc-or-1 (key hashmap)
  (let ((entry (gethash key hashmap)))
    (if entry
        (incf entry)
      (setf entry 1))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The idea is that by having the &lt;code&gt;let&lt;/code&gt; we 1) have less code, and 2) might
not need to lookup into the hash table twice. However, this doesn&apos;t work:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (defparameter my-map (make-hash-table))
MY-MAP
* (inc-or-1 123 my-map)
1
* (inc-or-1 123 my-map)
1
* (inc-or-1 123 my-map)
1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Apparently, we are required to use &lt;code&gt;(setf (gethash key table) value)&lt;/code&gt;,
and cannot go through the &lt;code&gt;let&lt;/code&gt;. Okay.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun inc-or-1 (key hashmap)
  (if (gethash key hashmap)
      (incf (gethash key hashmap))
    (setf (gethash key hashmap) 1)))

(inc-or-1 123 my-map)
1
* (inc-or-1 123 my-map)
2
* (inc-or-1 123 my-map)
3
* 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Good. Updating &lt;code&gt;day-3/insert-coordinates&lt;/code&gt; to use &lt;code&gt;inc-or-1&lt;/code&gt; rather than &lt;code&gt;setf&lt;/code&gt;
directly causes us to print out the coordinates correctly.  Now it&apos;s just a
matter of changing the print loop in &lt;code&gt;day-3/1&lt;/code&gt; to two loops: first parse and
insert all input lines, then count the number of points which count is more
than 1.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-3/1 (input)
  (let ((hashmap (make-hash-table)))
    (loop for line in input
          do (day-3/insert-coordinates (day-3/match-line line) hashmap))
    (loop for key being the hash-keys of hashmap
          counting (&amp;lt; 1 (gethash key hashmap)) into collisions
          finally (return collisions))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Reading the test input we&apos;re given into &lt;code&gt;*test-input-3*&lt;/code&gt; and running gives us:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (day-3/1 *test-input-3*)
The value
  &amp;quot;3&amp;quot;
is not of type
	NUMBER
when binding SB-KERNEL::X
  [Condition of type TYPE-ERROR]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Ooops!
We can change one line in &lt;code&gt;day-3/match-line&lt;/code&gt; to&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;  (ppcre:register-groups-bind ((#&apos;parse-integer id x y w h))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which apparently works. This is pretty much macro magic if you ask me.
However, our solution still doesn&apos;t work:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (day-3/1 *test-input-3*)
0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There is a few things that could have gone wrong. Input parsing, count
incrementing (though we somewhat checked this), count printing, or messing up
indices. After a quick &lt;code&gt;format&lt;/code&gt; debugging session, I see what&apos;s wrong: when
printing out the keys in the hashmap, there are multiple &amp;quot;equal&amp;quot; keys being
shown! We must tell the hashmap how to compare keys!  ... or, maybe it&apos;s
hashing to different values?&lt;/p&gt;
&lt;p&gt;Now, &lt;code&gt;make-hash-table&lt;/code&gt; do take a &lt;code&gt;:test&lt;/code&gt; argument.  However, according to
&lt;a href=&quot;https://www.tutorialspoint.com/lisp/lisp_hash_table.htm&quot;&gt;this&lt;/a&gt; site, this is
only allowed to be either &lt;code&gt;#&apos;eq&lt;/code&gt;, &lt;code&gt;#&apos;eql&lt;/code&gt;, or &lt;code&gt;#&apos;equal&lt;/code&gt;, neither of which
helps. Luckily,
&lt;a href=&quot;http://www.lispworks.com/documentation/HyperSpec/Body/f_mk_has.htm&quot;&gt;LispWorks&lt;/a&gt;
helps us out by saying that it can in fact also be &lt;code&gt;#&apos;equalp&lt;/code&gt;, and this fixes
our bug.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(day-3/1 *test-input-3*)
4
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which means that we have finally solved part 1!&lt;/p&gt;
&lt;h3&gt;Part 2&lt;/h3&gt;
&lt;p&gt;We spent quite a bit of time on part 1. Luckily, when we have this setup, part
2 does not take that long. We are asked to find the line in the input that does
not overlap with any other line. This property only holds for a single line.
We can do this in the following way: have a set of all lines that have not
overlapped with any other yet. When we are adding counts into the hashmap, we
detect overlaps (if the point is already there). Then we can remove the current
line from the set of non-overlapping lines. When we are done only one line
should remain.&lt;/p&gt;
&lt;p&gt;This doesn&apos;t quite work though, since the first rectangle somewhere
doesn&apos;t know that some other rectangle overlapped with it. In order
to fix this we map &lt;code&gt;point&lt;/code&gt; to &lt;code&gt;id&lt;/code&gt; in the hashmap, so that
when a rectangle finds another rectangle that it overlaps with, it
has both &lt;code&gt;id&lt;/code&gt;s, and can remove both from the list of non-overlapping
&lt;code&gt;id&lt;/code&gt;s. Now all consecutive lines that overlap with this line will also
attempt to remove the line from the unique set, but this is fine.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-3/2 (input)
  (let ((map (make-hash-table :test #&apos;equalp))
        (unique (make-hash-table)))
    (loop for line in input
          do (let ((rect (day-3/match-line line)))
               (setf (gethash (rect-id rect) unique) t)
               (loop for y from (rect-y rect)
                           below (+ (rect-y rect) (rect-h rect))
                     do (loop for x from (rect-x rect) 
                                    below (+ (rect-x rect) (rect-w rect))
                              do (let ((p (make-point :x x :y y)))
                                   (if (gethash p map)
                                       (progn (remhash (gethash p map) unique)
                                              (remhash (rect-id rect) unique))
                                       (setf (gethash p map) (rect-id rect)))))))
          finally (return (loop for key being the hash-keys of unique return key)))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Figuring out that I had to &lt;code&gt;finally (return (loop&lt;/code&gt; took me 15 minutes of
&lt;code&gt;(format&lt;/code&gt; debugging, but this solves it.&lt;/p&gt;
&lt;h2&gt;Day 4&lt;/h2&gt;
&lt;p&gt;Day four is upon us, and we continue.&lt;/p&gt;
&lt;h3&gt;Part 1&lt;/h3&gt;
&lt;p&gt;Today&apos;s first part is a little convoluted, but there are a few things that come
to mind when we want to clean up the data.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Read in the input such that for each guard we have a list of intervals in
which they sleep&lt;/li&gt;
&lt;li&gt;Make a length 60 array -- one for each minute --- for each guard, and count
the &amp;quot;number of sleeps&amp;quot; they have in that time.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We start off with input reading. One option is to go full &lt;code&gt;regex&lt;/code&gt;, as we did
yesterday, but we might do just fine without it: the time part of each line
always has the same length, so we can index directly into the string on the
positions we want, in order to identify which variant of message it is, and
extract the data; the only exception begin the guard ID, where we must scan
until we find a space, but also here is the starting position known ahead of
time.&lt;/p&gt;
&lt;p&gt;We can start out by writing a couple of predicates and accessor functions:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun guard-line-p (line) (eq (char line 19) #\G))
(defun sleep-line-p (line) (eq (char line 19) #\f))
(defun wake-line-p (line) (eq (char line 19) #\w))
(defun line-mm (line) (parse-integer (subseq line 15 17)))
(defun line-id (line)
  (let ((end (position #\SPACE (subseq line 26))))
    (parse-integer (subseq line 26 (+ 26 end)))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we want to loop through the lines, take out the data we need, and insert it
into a hash map that maps guard IDs to a list of intervals when they sleep.
This is slightly awkward since we must keep track of when the guard began
sleeping until the next iteration when we get the wake-up time.  A better
structure would be to directly advance the line iterator, while still in the
body of the loop, like this pseudo-code (notice how I&apos;m already moving away from
lisp syntax):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-rs&quot;&gt;lines = input.lines()
while line = lines.next() {
  if guard_line(line) { ... }
  else if sleep_line(line) {
    next = lines.next()
    start = line_mm(line)
    end	= line_mm(line)
    ...
 	}
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Not knowing how one would do something like this in CL, I settled for the
traditional state-keeping approach. Here we just print out all values of the
hashmap at the end.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-4/1 (input)
  (let ((guard-sleeps (make-hash-table))
        (sleep-start)
        (current-guard))
    (loop for line in input
          when (guard-line-p line) do (setf current-guard (line-id line))
          when (sleep-line-p line) do (setf sleep-start (line-mm line))
          when (wake-line-p line) do
          (let ((interval (make-interval :from sleep-start :to (line-mm line))))
            (if (gethash current-guard guard-sleeps)
                (push interval (gethash current-guard guard-sleeps))
                (setf (gethash current-guard guard-sleeps) (list interval)))))
    (loop for key being the hash-keys of guard-sleeps
          do (format t &amp;quot;~S -&amp;gt; ~S~%&amp;quot; key (gethash key guard-sleeps)))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next we make an array for each guard, and count the number of times the guard
has slept through each of the 60 minutes.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-4/1 (input)
  (let ((guard-arrays (make-hash-table))
				...
    (loop ...
    (loop for guard-id being the hash-keys of guard-sleeps
          do (let ((arr (make-array 60)))
               (loop for interval in (gethash guard-id guard-sleeps)
                     do (loop for i from (interval-from interval) below (interval-to interval)
                          do (incf (aref arr i))))
               (setf (gethash guard-id guard-arrays) arr)))
    (loop for k being the hash-keys of guard-arrays
          do (format t &amp;quot;~S ~S~%&amp;quot; k (gethash k guard-arrays)))))

10 #(0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 0 1 1 1 1 1 1 1
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0)
99 #(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
     1 1 1 2 2 2 2 2 3 2 2 2 2 1 1 1 1 1 0 0 0 0 0)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Based on the table given in the problem description, this looks very plausible.
Next up is solving the actual task: we wanted to find the guard that was most
asleep, find the minute they spent the most asleep, and multiply that minute
with the guards ID.&lt;/p&gt;
&lt;p&gt;As a side note, while trying to write this out I ran into some weird problems, and the debugger didn&apos;t help me much due
to variables being optimized away. Evaluating&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(declaim (optimize (speed 0) (safety 3) (debug 3)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;helped out a lot.
In any case, I eventually arrived at something that succeeded with the test input.
I&apos;m not too happy with this function: it is pretty messy, but it works.
One	thing one could do is split the three steps up into three functions,
but when things are supposed to happen sequentially I try to avoid
splitting the steps up into functions, at least if the only rationale is
that one function is &amp;quot;too long&amp;quot; to not be split.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-4/1 (input)
  (let ((guard-sleeps (make-hash-table))
        (guard-arrays (make-hash-table))
        (sleep-start)
        (current-guard))
    (loop for line in input
          when (guard-line-p line) do (setf current-guard (line-id line))
          when (sleep-line-p line) do (setf sleep-start (line-mm line))
          when (wake-line-p line) do
          (let ((interval (make-interval :from sleep-start :to (line-mm line))))
            (if (gethash current-guard guard-sleeps)
                (push interval (gethash current-guard guard-sleeps))
                (setf (gethash current-guard guard-sleeps) (list interval)))))
    (loop for guard-id being the hash-keys of guard-sleeps
          do (let ((arr (make-array 60)))
               (loop for interval in (gethash guard-id guard-sleeps)
                     do (loop for i from (interval-from interval) below (interval-to interval)
                          do (incf (aref arr i))))
               (setf (gethash guard-id guard-arrays) arr)))
    (let* ((sums (loop for k being the hash-keys of guard-arrays
                      collect (list (reduce #&apos;+ (gethash k guard-arrays)) k)))
           (laziest (second (first (sort sums #&apos;&amp;gt; :key #&apos;car))))
           (arr (gethash laziest guard-arrays))
           (max-freq (reduce #&apos;max arr)))
      (* laziest (position max-freq arr)))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This, however, didn&apos;t run with the real data; whoever made the input decided to
put in one little gotcha, and shuffle all lines.  Luckily this is pretty
straight forward to fix with a &lt;code&gt;(sort input #&apos;string&amp;lt;)&lt;/code&gt;. After this, part 1 was
solved.&lt;/p&gt;
&lt;p&gt;However, the story doesn&apos;t end there. After trying to run it a second time,
we get this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;The value
  NIL
is not of type
  REAL
when binding I
  [Condition of type TYPE-ERROR]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Looking at the debugger we&apos;re in a weird situation where we are looping through the
keys of &lt;code&gt;guard-sleeps&lt;/code&gt;, but the key &lt;code&gt;guard-id&lt;/code&gt; is &lt;code&gt;nil&lt;/code&gt;. Pressing &lt;code&gt;RET&lt;/code&gt; while
the cursor is over &lt;code&gt;GUARD-SLEEPS&lt;/code&gt; in the backtrace shows us this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;#&amp;lt;HASH-TABLE {1008752283}&amp;gt;
--------------------
Count: 21
Size: 32
Test: EQL
Rehash size: 1.5
Rehash threshold: 1.0
[clear hashtable]
Contents: 
241 = (#S(INTERVAL :FROM 47 :TO 57) #S(INTERVAL :FROM 6 :TO 44) #S(INTERVAL :FROM 40 :TO 50) #S(INTERVAL :FROM 22 :TO 46) #S(INTERVAL :FROM 1 :TO 12) #S(INTERVAL :FROM 46 :TO 52) #S(INTERVAL :FROM 19 :TO 42) #S(INTERVAL :FROM 50 :TO 58) #S(INTERVAL :FROM 56 :TO 57) #S(INTERVAL :FROM 48 :TO 49) #S(INTERVAL :FROM 29 :TO 41) #S(INTERVAL :FROM 41 :TO 49) #S(INTERVAL :FROM 12 :TO 19) #S(INTERVAL :FROM 47 :TO 51) #S(INTERVAL :FROM 31 :TO 42) #S(INTERVAL :FROM 18 :TO 24) #S(INTERVAL :FROM 33 :TO 52) ..) [remove entry]
1213 = (#S(INTERVAL :FROM 53 :TO 57) #S(INTERVAL :FROM 48 :TO 50) #S(INTERVAL :FROM 6 :TO 49) #S(INTERVAL :FROM 46 :TO 57) #S(INTERVAL :FROM 2 :TO 35) #S(INTERVAL :FROM 33 :TO 49) #S(INTERVAL :FROM 18 :TO 56) #S(INTERVAL :FROM 46 :TO 52) #S(INTERVAL :FROM 0 :TO 26) #S(INTERVAL :FROM 21 :TO 42) #S(INTERVAL :FROM 40 :TO 46) #S(INTERVAL :FROM 23 :TO 45) #S(INTERVAL :FROM 17 :TO 55) #S(INTERVAL :FROM 26 :TO 39) #S(INTERVAL :FROM 12 :TO 19)) [remove entry]
2903 = (#S(INTERVAL :FROM 4 :TO 48) #S(INTERVAL :FROM 39 :TO 42) #S(INTERVAL :FROM 34 :TO 40) #S(INTERVAL :FROM 49 :TO 56) #S(INTERVAL :FROM 30 :TO 41) #S(INTERVAL :FROM 54 :TO 58) #S(INTERVAL :FROM 24 :TO 53) #S(INTERVAL :FROM 32 :TO 46) #S(INTERVAL :FROM 56 :TO 59) #S(INTERVAL :FROM 26 :TO 42) #S(INTERVAL :FROM 35 :TO 52) #S(INTERVAL :FROM 27 :TO 47)) [remove entry]
1283 = (#S(INTERVAL :FROM 22 :TO 42) #S(INTERVAL :FROM 56 :TO 59) #S(INTERVAL :FROM 6 :TO 49) #S(INTERVAL :FROM 32 :TO 42) #S(INTERVAL :FROM 9 :TO 21) #S(INTERVAL :FROM 17 :TO 46) #S(INTERVAL :FROM 45 :TO 47) #S(INTERVAL :FROM 13 :TO 55) #S(INTERVAL :FROM 57 :TO 59) #S(INTERVAL :FROM 40 :TO 48) #S(INTERVAL :FROM 26 :TO 52) #S(INTERVAL :FROM 2 :TO 17) #S(INTERVAL :FROM 53 :TO 55) #S(INTERVAL :FROM 19 :TO 47) #S(INTERVAL :FROM 41 :TO 46) #S(INTERVAL :FROM 24 :TO 29) #S(INTERVAL :FROM 22 :TO 52) ..) [remove entry]
829 = (#S(INTERVAL :FROM 40 :TO 50) #S(INTERVAL :FROM 11 :TO 24) #S(INTERVAL :FROM 29 :TO 32) #S(INTERVAL :FROM 37 :TO 45) #S(INTERVAL :FROM 31 :TO 32) #S(INTERVAL :FROM 32 :TO 52) #S(INTERVAL :FROM 20 :TO 39) #S(INTERVAL :FROM 57 :TO 59) #S(INTERVAL :FROM 10 :TO 26) #S(INTERVAL :FROM 4 :TO 39) #S(INTERVAL :FROM 8 :TO 18)) [remove entry]
3347 = (#S(INTERVAL :FROM 36 :TO 46) #S(INTERVAL :FROM 14 :TO 25) #S(INTERVAL :FROM 7 :TO 48) #S(INTERVAL :FROM 18 :TO 56) #S(INTERVAL :FROM 7 :TO 14) #S(INTERVAL :FROM 48 :TO 57) #S(INTERVAL :FROM 9 :TO 53) #S(INTERVAL :FROM 41 :TO 57) #S(INTERVAL :FROM 39 :TO 47) #S(INTERVAL :FROM 33 :TO 34) #S(INTERVAL :FROM 52 :TO 59) #S(INTERVAL :FROM 30 :TO 46) #S(INTERVAL :FROM 41 :TO 46) #S(INTERVAL :FROM 13 :TO 26) #S(INTERVAL :FROM 54 :TO 55) #S(INTERVAL :FROM 23 :TO 48) #S(INTERVAL :FROM 57 :TO 59) ..) [remove entry]
1319 = (#S(INTERVAL :FROM 50 :TO 59) #S(INTERVAL :FROM 2 :TO 37) #S(INTERVAL :FROM 37 :TO 45) #S(INTERVAL :FROM 46 :TO 58) #S(INTERVAL :FROM 0 :TO 31) #S(INTERVAL :FROM 33 :TO 50) #S(INTERVAL :FROM 29 :TO 45) #S(INTERVAL :FROM 1 :TO 42) #S(INTERVAL :FROM 25 :TO 29) #S(INTERVAL :FROM 24 :TO 42) #S(INTERVAL :FROM 50 :TO 55) #S(INTERVAL :FROM 18 :TO 27) #S(INTERVAL :FROM 19 :TO 57) #S(INTERVAL :FROM 29 :TO 35) #S(INTERVAL :FROM 8 :TO 53)) [remove entry]
439 = (#S(INTERVAL :FROM 19 :TO 55) #S(INTERVAL :FROM 33 :TO 39) #S(INTERVAL :FROM 41 :TO 51) #S(INTERVAL :FROM 37 :TO 50) #S(INTERVAL :FROM 9 :TO 53) #S(INTERVAL :FROM 31 :TO 38) #S(INTERVAL :FROM 38 :TO 59) #S(INTERVAL :FROM 14 :TO 25) #S(INTERVAL :FROM 51 :TO 59) #S(INTERVAL :FROM 19 :TO 24) #S(INTERVAL :FROM 5 :TO 32) #S(INTERVAL :FROM 52 :TO 54) #S(INTERVAL :FROM 1 :TO 41) #S(INTERVAL :FROM 51 :TO 56) #S(INTERVAL :FROM 7 :TO 33) #S(INTERVAL :FROM 6 :TO 30) #S(INTERVAL :FROM 24 :TO 57) ..) [remove entry]
2213 = (#S(INTERVAL :FROM 52 :TO 58) #S(INTERVAL :FROM 38 :TO 48) #S(INTERVAL :FROM 54 :TO 57) #S(INTERVAL :FROM 27 :TO 53) #S(INTERVAL :FROM 46 :TO 57) #S(INTERVAL :FROM 30 :TO 43) #S(INTERVAL :FROM 57 :TO 58) #S(INTERVAL :FROM 36 :TO 46) #S(INTERVAL :FROM 6 :TO 29) #S(INTERVAL :FROM 33 :TO 55) #S(INTERVAL :FROM 23 :TO 26)) [remove entry]
3319 = (#S(INTERVAL :FROM 50 :TO 57) #S(INTERVAL :FROM 41 :TO 43) #S(INTERVAL :FROM 10 :TO 36) #S(INTERVAL :FROM 7 :TO 53) #S(INTERVAL :FROM 4 :TO 42) #S(INTERVAL :FROM 35 :TO 58) #S(INTERVAL :FROM 57 :TO 58) #S(INTERVAL :FROM 51 :TO 54) #S(INTERVAL :FROM 3 :TO 19) #S(INTERVAL :FROM 54 :TO 57) #S(INTERVAL :FROM 7 :TO 34) #S(INTERVAL :FROM 56 :TO 59) #S(INTERVAL :FROM 21 :TO 53) #S(INTERVAL :FROM 32 :TO 38) #S(INTERVAL :FROM 42 :TO 46) #S(INTERVAL :FROM 21 :TO 35) #S(INTERVAL :FROM 11 :TO 15) ..) [remove entry]
2539 = (#S(INTERVAL :FROM 42 :TO 51) #S(INTERVAL :FROM 10 :TO 27) #S(INTERVAL :FROM 45 :TO 55) #S(INTERVAL :FROM 33 :TO 35) #S(INTERVAL :FROM 44 :TO 56) #S(INTERVAL :FROM 12 :TO 36) #S(INTERVAL :FROM 43 :TO 57) #S(INTERVAL :FROM 23 :TO 34) #S(INTERVAL :FROM 57 :TO 58) #S(INTERVAL :FROM 15 :TO 39) #S(INTERVAL :FROM 52 :TO 54) #S(INTERVAL :FROM 32 :TO 36) #S(INTERVAL :FROM 7 :TO 22)) [remove entry]
631 = (#S(INTERVAL :FROM 44 :TO 58) #S(INTERVAL :FROM 3 :TO 27) #S(INTERVAL :FROM 51 :TO 56) #S(INTERVAL :FROM 23 :TO 47) #S(INTERVAL :FROM 6 :TO 17) #S(INTERVAL :FROM 50 :TO 56) #S(INTERVAL :FROM 15 :TO 46) #S(INTERVAL :FROM 55 :TO 56) #S(INTERVAL :FROM 1 :TO 49) #S(INTERVAL :FROM 23 :TO 57) #S(INTERVAL :FROM 44 :TO 48) #S(INTERVAL :FROM 3 :TO 29) #S(INTERVAL :FROM 33 :TO 45) #S(INTERVAL :FROM 11 :TO 21)) [remove entry]
2129 = (#S(INTERVAL :FROM 51 :TO 57) #S(INTERVAL :FROM 36 :TO 46) #S(INTERVAL :FROM 42 :TO 43) #S(INTERVAL :FROM 43 :TO 51) #S(INTERVAL :FROM 15 :TO 38) #S(INTERVAL :FROM 54 :TO 59) #S(INTERVAL :FROM 41 :TO 43) #S(INTERVAL :FROM 54 :TO 59) #S(INTERVAL :FROM 6 :TO 47) #S(INTERVAL :FROM 48 :TO 57) #S(INTERVAL :FROM 32 :TO 56) #S(INTERVAL :FROM 38 :TO 54)) [remove entry]
1889 = (#S(INTERVAL :FROM 57 :TO 59) #S(INTERVAL :FROM 30 :TO 35) #S(INTERVAL :FROM 31 :TO 42) #S(INTERVAL :FROM 31 :TO 41) #S(INTERVAL :FROM 39 :TO 40) #S(INTERVAL :FROM 28 :TO 33) #S(INTERVAL :FROM 56 :TO 57) #S(INTERVAL :FROM 29 :TO 34) #S(INTERVAL :FROM 27 :TO 42) #S(INTERVAL :FROM 24 :TO 32) #S(INTERVAL :FROM 57 :TO 59) #S(INTERVAL :FROM 44 :TO 51) #S(INTERVAL :FROM 31 :TO 36) #S(INTERVAL :FROM 22 :TO 36) #S(INTERVAL :FROM 11 :TO 15) #S(INTERVAL :FROM 2 :TO 47) #S(INTERVAL :FROM 27 :TO 50) ..) [remove entry]
2137 = (#S(INTERVAL :FROM 49 :TO 59) #S(INTERVAL :FROM 43 :TO 53) #S(INTERVAL :FROM 4 :TO 47) #S(INTERVAL :FROM 55 :TO 56) #S(INTERVAL :FROM 35 :TO 52) #S(INTERVAL :FROM 50 :TO 55) #S(INTERVAL :FROM 46 :TO 47) #S(INTERVAL :FROM 52 :TO 58) #S(INTERVAL :FROM 23 :TO 26) #S(INTERVAL :FROM 45 :TO 57)) [remove entry]
2251 = (#S(INTERVAL :FROM 22 :TO 38) #S(INTERVAL :FROM 17 :TO 31) #S(INTERVAL :FROM 27 :TO 54) #S(INTERVAL :FROM 8 :TO 22) #S(INTERVAL :FROM 49 :TO 56) #S(INTERVAL :FROM 7 :TO 14) #S(INTERVAL :FROM 12 :TO 35) #S(INTERVAL :FROM 56 :TO 58) #S(INTERVAL :FROM 25 :TO 32) #S(INTERVAL :FROM 3 :TO 20) #S(INTERVAL :FROM 55 :TO 59) #S(INTERVAL :FROM 14 :TO 40) #S(INTERVAL :FROM 52 :TO 55) #S(INTERVAL :FROM 8 :TO 56) #S(INTERVAL :FROM 21 :TO 37)) [remove entry]
2389 = (#S(INTERVAL :FROM 57 :TO 58) #S(INTERVAL :FROM 28 :TO 49) #S(INTERVAL :FROM 5 :TO 22) #S(INTERVAL :FROM 57 :TO 59) #S(INTERVAL :FROM 52 :TO 53) #S(INTERVAL :FROM 13 :TO 20) #S(INTERVAL :FROM 28 :TO 58) #S(INTERVAL :FROM 11 :TO 14) #S(INTERVAL :FROM 42 :TO 54) #S(INTERVAL :FROM 53 :TO 55) #S(INTERVAL :FROM 9 :TO 33) #S(INTERVAL :FROM 51 :TO 55) #S(INTERVAL :FROM 37 :TO 39) #S(INTERVAL :FROM 56 :TO 59) #S(INTERVAL :FROM 15 :TO 48) #S(INTERVAL :FROM 53 :TO 55) #S(INTERVAL :FROM 52 :TO 59) ..) [remove entry]
1777 = (#S(INTERVAL :FROM 46 :TO 51) #S(INTERVAL :FROM 9 :TO 37) #S(INTERVAL :FROM 52 :TO 59) #S(INTERVAL :FROM 36 :TO 39) #S(INTERVAL :FROM 47 :TO 56) #S(INTERVAL :FROM 24 :TO 34) #S(INTERVAL :FROM 48 :TO 52) #S(INTERVAL :FROM 6 :TO 38) #S(INTERVAL :FROM 1 :TO 49) #S(INTERVAL :FROM 53 :TO 58) #S(INTERVAL :FROM 34 :TO 45) #S(INTERVAL :FROM 28 :TO 30) #S(INTERVAL :FROM 10 :TO 58) #S(INTERVAL :FROM 10 :TO 49) #S(INTERVAL :FROM 40 :TO 52) #S(INTERVAL :FROM 15 :TO 35) #S(INTERVAL :FROM 31 :TO 57) ..) [remove entry]
3371 = (#S(INTERVAL :FROM 38 :TO 50) #S(INTERVAL :FROM 8 :TO 15) #S(INTERVAL :FROM 53 :TO 54) #S(INTERVAL :FROM 11 :TO 29) #S(INTERVAL :FROM 27 :TO 53) #S(INTERVAL :FROM 33 :TO 48) #S(INTERVAL :FROM 33 :TO 49) #S(INTERVAL :FROM 39 :TO 52) #S(INTERVAL :FROM 34 :TO 36) #S(INTERVAL :FROM 0 :TO 22) #S(INTERVAL :FROM 51 :TO 57) #S(INTERVAL :FROM 52 :TO 54) #S(INTERVAL :FROM 6 :TO 49) #S(INTERVAL :FROM 38 :TO 57) #S(INTERVAL :FROM 27 :TO 43) #S(INTERVAL :FROM 37 :TO 53) #S(INTERVAL :FROM 0 :TO 28) ..) [remove entry]
103 = (#S(INTERVAL :FROM 56 :TO 59) #S(INTERVAL :FROM 1 :TO 30) #S(INTERVAL :FROM 38 :TO 41) #S(INTERVAL :FROM 31 :TO 41) #S(INTERVAL :FROM 48 :TO 55) #S(INTERVAL :FROM 23 :TO 36) #S(INTERVAL :FROM 38 :TO 49) #S(INTERVAL :FROM 12 :TO 25) #S(INTERVAL :FROM 26 :TO 49) #S(INTERVAL :FROM 17 :TO 23) #S(INTERVAL :FROM 13 :TO 56) #S(INTERVAL :FROM 39 :TO 56) #S(INTERVAL :FROM 24 :TO 36) #S(INTERVAL :FROM 26 :TO 55) #S(INTERVAL :FROM 31 :TO 37) #S(INTERVAL :FROM 57 :TO 58) #S(INTERVAL :FROM 15 :TO 50) ..) [remove entry]
NIL = (#S(INTERVAL :FROM NIL :TO 57)) [remove entry]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We see that the last entry has key &lt;code&gt;nil&lt;/code&gt;, and the &lt;code&gt;interval&lt;/code&gt; it maps to has
&lt;code&gt;:from nil&lt;/code&gt;. Strange! Looking closer at the Slime debugging window we do find
the problem though: in the value of &lt;code&gt;input&lt;/code&gt;. We have already seen that &lt;code&gt;sort&lt;/code&gt;
destructively sort the given list. In addition, we do know that lists in Lisp
are linked, so sorting a list means shuffling around pointers.  Aha! If a list
is just a pointer to its first element and we sort the list, that means that
the reference we have to the list, the pointer to an element that used to be
first, is no longer first, and all elements that were put in front of it is no
longer reachable!  The following illustrates:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;* (defparameter bing &apos;(9 6 3 5 7 1 8 2 3 5))
* (format t &amp;quot;~S~%&amp;quot; bing)
(9 6 3 5 7 1 8 2 3 5)
* (defparameter bong (sort bing #&apos;&amp;lt;))

* (format t &amp;quot;~S~%&amp;quot; bing)
(6 7 8 9)
* (format t &amp;quot;~S~%&amp;quot; bong)
(1 2 3 3 5 5 6 7 8 9) ; so much for attempting to type in 1 through 9 shuffled
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The solution is rather simple: instead of sorting inside &lt;code&gt;day-4/1&lt;/code&gt; we just sort when we set value, in
&lt;code&gt;defparameter&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;Part 2&lt;/h3&gt;
&lt;p&gt;Part two asks us for a tiny modification to our function: instead of selecting the guard that
sleeps the most, we want the guard who has slept the most times on any minute.
So instead of maximizing by summing, we will maximize by &lt;code&gt;max&lt;/code&gt;ing
(At this point I&apos;m tempted to refactor out most of the logic, but
at the same time, this is a write once, run once situation):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lisp&quot;&gt;(defun day-4/2 (input)
  (let ((guard-sleeps (make-hash-table))
        (guard-arrays (make-hash-table))
        (sleep-start)
        (current-guard))
    (loop for line in input
          when (guard-line-p line) do (setf current-guard (line-id line))
          when (sleep-line-p line) do (setf sleep-start (line-mm line))
          when (wake-line-p line) do
          (let ((interval (make-interval :from sleep-start :to (line-mm line))))
            (if (gethash current-guard guard-sleeps)
                (push interval (gethash current-guard guard-sleeps))
                (setf (gethash current-guard guard-sleeps) (list interval)))))
    (loop for guard-id being the hash-keys of guard-sleeps
          do (let ((arr (make-array 60)))
               (loop for interval in (gethash guard-id guard-sleeps)
                     do (loop for i from (interval-from interval) below (interval-to interval)
                          do (incf (aref arr i))))
               (setf (gethash guard-id guard-arrays) arr)))
    (let* ((sums (loop for k being the hash-keys of guard-arrays
                      collect (list (reduce #&apos;max (gethash k guard-arrays)) k))) ;; HERE!!
           (laziest (second (first (sort sums #&apos;&amp;gt; :key #&apos;car))))
           (arr (gethash laziest guard-arrays))
           (max-freq (reduce #&apos;max arr)))
      (* laziest (position max-freq arr)))))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The only thing we changed was changing a &lt;code&gt;#&apos;+&lt;/code&gt; to a &lt;code&gt;#&apos;max&lt;/code&gt;.
Hooray! Code reuse!&lt;/p&gt;
&lt;h2&gt;Thoughts so far&lt;/h2&gt;
&lt;p&gt;Common Lisp is a bit of a weird language for me. Certain things I figured would
be easy, like making tuples, seems to force you to use &lt;code&gt;(list ..)&lt;/code&gt;, which
presumably allocates. In addition, the dynamic nature of the language is
something I&apos;m still getting used to, begin a big fan of statically typed
languages. Despite being foreign, I think that most of what I have wanted to do
has been expressible in CL, and this is, after all, the main point of a
programming language.&lt;/p&gt;
&lt;p&gt;Lastly, the Slime experience is something I look forward to getting to know
better.  Being able to interactively looking through the state of the stack,
including all local variables, &lt;em&gt;at once when you get an error&lt;/em&gt; is simply not
something I&apos;m used to; this is the reason I included the hash table output
above, despite not actually using it to find the source of the bug. It was just
really cool!&lt;/p&gt;
&lt;p&gt;Thank you for reading.&lt;/p&gt;
</content></entry><entry><title>Four Books and Two Cheat Sheets</title><id>https://mht.wtf/post/4books/</id><updated>2024-09-22T11:50:00+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/4books/" rel=""/><link href="https://mht.wtf/post/4books/index.html" rel="alternate"/><published>2024-09-22T11:50:00+02:00</published><content type="text/html">&lt;p&gt;I took a two week vacation where I found a lot of time to read, and during those two weeks I read four books:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://dataintensive.net/&quot;&gt;Designing Data-Intensive Applications&lt;/a&gt; by &lt;a href=&quot;https://martin.kleppmann.com/&quot;&gt;Martin Kleppmann&lt;/a&gt;,&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://marabos.nl/atomics/&quot;&gt;Rust Atomics and Locks&lt;/a&gt; by &lt;a href=&quot;https://marabos.nl/&quot;&gt;Mara Bos&lt;/a&gt;,&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://web.stanford.edu/~ouster/cgi-bin/book.php&quot;&gt;A Philosophy of Software Design&lt;/a&gt; by &lt;a href=&quot;https://web.stanford.edu/~ouster/cgi-bin/home.php&quot;&gt;John Ousterhout&lt;/a&gt;, and&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.noidea.dog/staff/&quot;&gt;The Staff Engineer&apos;s Path&lt;/a&gt; by &lt;a href=&quot;https://www.noidea.dog/&quot;&gt;Tanya Reilly&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I really enjoyed all four books.
Here are some quick notes:&lt;/p&gt;
&lt;h2&gt;&lt;em&gt;Designing Data-Intensive Applications&lt;/em&gt;&lt;/h2&gt;
&lt;p&gt;This is a database book, and I had, somehow, not realized that that&apos;s what it is.
It covers mainly databases and distributed systems, and very little at the &amp;quot;application&amp;quot; level.
Under &amp;quot;Scope of This Book&amp;quot;, it says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“We look primarily at the architecture of data systems and the ways they are integrated into data-intensive applications”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;which I think is accurate. The book gives a great overview on all things &amp;quot;data system architecture&amp;quot;,
with multiple services in multiple datacenters, but it contains little when it comes to single applications.&lt;/p&gt;
&lt;h2&gt;&lt;em&gt;Rust Atomics and Locks&lt;/em&gt;&lt;/h2&gt;
&lt;p&gt;I was already pretty familiar with Rust atomics, considering my Master&apos;s thesis back in 2018 was building a concurrent GC for Rust.
However, the book cleared up some confusion regarding &lt;a href=&quot;https://doc.rust-lang.org/std/cmp/enum.Ordering.html&quot;&gt;memory ordering&lt;/a&gt;,
and was a nice tour of APIs that are either new since 2018 or that I didn&apos;t use back then either.&lt;br /&gt;
It also highlights differences in output for x86_64 and ARM, which was especially neat.&lt;/p&gt;
&lt;h2&gt;&lt;em&gt;A Philosophy of Software Design&lt;/em&gt;&lt;/h2&gt;
&lt;p&gt;This is maybe the only book of its kind that I had heard was good, and I agree.
&amp;quot;Strategic programming&amp;quot;, and the principle &amp;quot;modules should be deep&amp;quot; puts words on a feeling I&apos;ve had, but that I haven&apos;t been able to succinctly express.
It even has a performance chapter, and it is actually good! The first sentence of its conclusion is&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The most important overall lesson from this chapter is that clean design and high performance are compatible.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So good.&lt;/p&gt;
&lt;p&gt;The book isn&apos;t OOP-centric, but it often looks through an OOP lens; I wish the
text would consistently say &amp;quot;module&amp;quot; instead of &amp;quot;class&amp;quot; to better disambiguate
itself from the standard OOPy soupy advice.&lt;/p&gt;
&lt;h2&gt;&lt;em&gt;The Staff Engineer&apos;s Path&lt;/em&gt;&lt;/h2&gt;
&lt;p&gt;What happens after &amp;quot;senior engineer&amp;quot;?
This book paints a big picture to answer that question.
Parts of the book is only really applicable to large organisations, but there&apos;s plenty of content for an org of any size.
I especially liked the chapter &lt;em&gt;&amp;quot;You’re a Role Model Now (Sorry)&amp;quot;&lt;/em&gt;.&lt;/p&gt;
&lt;h2&gt;Cheat sheets&lt;/h2&gt;
&lt;p&gt;Both &lt;em&gt;A Philosophy of Software Design&lt;/em&gt; and &lt;em&gt;The Staff Engineer&apos;s Path&lt;/em&gt; had lists of summaries, either at the end of each chapter or at the end of the book.
In order to better remember the takeaways from the books I made cheat sheets containing the summaries.
This was also a good reason for checking out &lt;a href=&quot;https://typst.app&quot;&gt;typst&lt;/a&gt;, which I also really liked.
Here they are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;posd.pdf&quot;&gt;posd.pdf&lt;/a&gt; (51KB) with &lt;a href=&quot;posd.typ&quot;&gt;source code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;tsep.pdf&quot;&gt;tsep.pdf&lt;/a&gt; (54KB) with &lt;a href=&quot;tsep.typ&quot;&gt;source code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both Typst files use the tiny &lt;a href=&quot;common.typ&quot;&gt;common.typ&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;eBooks.com&lt;/h2&gt;
&lt;p&gt;I bought three out of four books from &lt;a href=&quot;http://ebooks.com&quot;&gt;ebooks.com&lt;/a&gt;. It has become my new go-to place for buying ebooks,
because you get DRM-free &lt;code&gt;pdf&lt;/code&gt;s and &lt;code&gt;epub&lt;/code&gt;s that you can &lt;strong&gt;just download&lt;/strong&gt;.
No weird apps or &lt;code&gt;calibre&lt;/code&gt; DRM-stripping required. It&apos;s great!
The selection is so-so, but notably O&apos;Reilly seem to have their entire catalog there.&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
</content></entry><entry><title>Languages, Performance, and Intent</title><id>https://mht.wtf/post/lpi/</id><updated>2022-08-24T22:51:20+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/lpi/" rel=""/><link href="https://mht.wtf/post/lpi/index.html" rel="alternate"/><published>2022-08-24T22:51:20+02:00</published><content type="text/html">&lt;p&gt;Optimizing compilers are really cool!
They look at your code and rewrite it so that it&apos;s behavior is unchanged and it&apos;s execution time is reduced.
The fact that the compiler cannot change the semantics of your code sounds obvious, but there is a crucial detail here:
it &lt;em&gt;cannot&lt;/em&gt; change your code for &lt;em&gt;any&lt;/em&gt; input&lt;sup&gt;&lt;a href=&quot;#user-content-fn-ub&quot; id=&quot;user-content-fnref-ub&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.
If you write a function &lt;code&gt;fn foo(i32)&lt;/code&gt; and the compiler wants to generate the code &lt;code&gt;fn fast_foo(i32)&lt;/code&gt;,
it must hold that for &lt;em&gt;any&lt;/em&gt; input, like &lt;code&gt;0&lt;/code&gt;, &lt;code&gt;1&lt;/code&gt;, &lt;code&gt;123&lt;/code&gt;, &lt;code&gt;-999&lt;/code&gt;, &lt;code&gt;i32::MIN&lt;/code&gt;, &lt;code&gt;i32::MAX&lt;/code&gt;, or &lt;code&gt;1337&lt;/code&gt;, the behavior of &lt;code&gt;foo&lt;/code&gt; and &lt;code&gt;fast_foo&lt;/code&gt; is identical.
This means the compiler is forced to take into account the corner cases of yout code, which may or may not be a part of your &lt;em&gt;intent&lt;/em&gt; when writing that code.&lt;/p&gt;
&lt;p&gt;Chandler Carruth shows an example in his talk &lt;em&gt;&amp;quot;Garbage In, Garbage Out: Arguing about Undefined Behavior With Nasal Demons&amp;quot;&lt;/em&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=yG1OZ69H_-o&amp;amp;t=2357s&quot;&gt;as seen here&lt;/a&gt;.
His example shows the difference between &lt;code&gt;signed&lt;/code&gt; and &lt;code&gt;unsigned&lt;/code&gt; integers in C++, and how the defined wrapping of &lt;code&gt;unsigned&lt;/code&gt; integers causes the compiler to output bad code, whereas using &lt;code&gt;signed&lt;/code&gt; integers would make the compiler generate good code because it can assume that overflow does not happen.
He suggests that the reason the programmer chose &lt;code&gt;unsigned&lt;/code&gt; in this case was (a) because it is semantically correct&lt;sup&gt;&lt;a href=&quot;#user-content-fn-sema&quot; id=&quot;user-content-fnref-sema&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, and (b) that they were &amp;quot;a little bit worried about a narrow contract&amp;quot;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-ubt&quot; id=&quot;user-content-fnref-ubt&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.
&lt;a href=&quot;https://www.youtube.com/watch?v=yG1OZ69H_-o&amp;amp;t=47m52s&quot;&gt;A little later&lt;/a&gt; he answers a question by underscoring the fact that the behavior is different:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Q: Isn&apos;t this just a failure of the optimizer doing the right thing? &lt;br/&gt;
A: No! We cannot produce this assembly [shows good assembly] for this function [shows initial function]. [...] They are semantically different.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I like this example because it is such a minor choice.
It sounds like &lt;code&gt;unsigned&lt;/code&gt; would be the right choice since we wouldn&apos;t have to worry about accidentally passing in negative offsets&lt;sup&gt;&lt;a href=&quot;#user-content-fn-no&quot; id=&quot;user-content-fnref-no&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;,
and yet it has a very significant performance impact due to how the compiler is allowed to reason.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;Actually checking the bzip2 asm&lt;/summary&gt;
&lt;p&gt;I decided to double check the asm from Chandler&apos;s presentation.
The function from the video &lt;a href=&quot;https://gitlab.com/bzip2/bzip2/-/blob/2d8393924b9f3e014000c7420c7da7c3ddb74e2c/blocksort.c#L347&quot;&gt;is still using &lt;code&gt;unsigned&lt;/code&gt; integers&lt;/a&gt;,
and I can&apos;t find an issue or a pull request suggesting to make this change, so I decided to check the compiler output myself.
If the performance improvement is so great by using &lt;code&gt;signed&lt;/code&gt; integers instead, why aren&apos;t they doing so?&lt;/p&gt;
&lt;p&gt;Here&apos;s exactly what I did:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;$ git clone https://gitlab.com/bzip2/bzip2
$ cd bzip2
$ mkdir build
$ cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=Release
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Building straight away doesn&apos;t help us, since &lt;code&gt;mainGtU&lt;/code&gt; is marked &lt;code&gt;static&lt;/code&gt;, and we&apos;d like to have it exported in the final executable. I removed the two lines &lt;code&gt;static __inline__&lt;/code&gt; from &lt;code&gt;mainGtU&lt;/code&gt;, and we&apos;re off to the races:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;make -Cbuild
objdump build/bzip2 | less
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Search for &lt;code&gt;mainGtU&lt;/code&gt; and I get the following:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-objdump&quot;&gt;0000000000006ce0 &amp;lt;mainGtU.part.0&amp;gt;:
    6ce0:	48 89 d0             	mov    %rdx,%rax
    6ce3:	49 89 ca             	mov    %rcx,%r10
    6ce6:	8d 57 03             	lea    0x3(%rdi),%edx
    6ce9:	8d 4e 03             	lea    0x3(%rsi),%ecx
    6cec:	0f b6 14 10          	movzbl (%rax,%rdx,1),%edx
    6cf0:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    6cf4:	38 ca                	cmp    %cl,%dl
    6cf6:	75 12                	jne    6d0a &amp;lt;mainGtU.part.0+0x2a&amp;gt;
    6cf8:	8d 57 04             	lea    0x4(%rdi),%edx
    6cfb:	8d 4e 04             	lea    0x4(%rsi),%ecx
    6cfe:	0f b6 14 10          	movzbl (%rax,%rdx,1),%edx
    6d02:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    6d06:	38 ca                	cmp    %cl,%dl
    6d08:	74 06                	je     6d10 &amp;lt;mainGtU.part.0+0x30&amp;gt;
    6d0a:	38 d1                	cmp    %dl,%cl
    6d0c:	0f 92 c0             	setb   %al
    6d0f:	c3                   	ret    
    6d10:	8d 57 05             	lea    0x5(%rdi),%edx
    6d13:	8d 4e 05             	lea    0x5(%rsi),%ecx
    6d16:	0f b6 14 10          	movzbl (%rax,%rdx,1),%edx
    6d1a:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    6d1e:	38 ca                	cmp    %cl,%dl
    6d20:	75 e8                	jne    6d0a &amp;lt;mainGtU.part.0+0x2a&amp;gt;
    6d22:	8d 57 06             	lea    0x6(%rdi),%edx
    6d25:	8d 4e 06             	lea    0x6(%rsi),%ecx
    6d28:	0f b6 14 10          	movzbl (%rax,%rdx,1),%edx
    6d2c:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    6d30:	38 ca                	cmp    %cl,%dl
    6d32:	75 d6                	jne    6d0a &amp;lt;mainGtU.part.0+0x2a&amp;gt;
    6d34:	8d 57 07             	lea    0x7(%rdi),%edx
    6d37:	8d 4e 07             	lea    0x7(%rsi),%ecx
    6d3a:	0f b6 14 10          	movzbl (%rax,%rdx,1),%edx
    6d3e:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    6d42:	38 ca                	cmp    %cl,%dl
    6d44:	75 c4                	jne    6d0a &amp;lt;mainGtU.part.0+0x2a&amp;gt;
    6d46:	8d 57 08             	lea    0x8(%rdi),%edx
    6d49:	8d 4e 08             	lea    0x8(%rsi),%ecx
    6d4c:	0f b6 14 10          	movzbl (%rax,%rdx,1),%edx
    6d50:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    6d54:	38 ca                	cmp    %cl,%dl
    6d56:	75 b2                	jne    6d0a &amp;lt;mainGtU.part.0+0x2a&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Compare this to what we get from the same function if we replace the types of &lt;code&gt;i1&lt;/code&gt; and &lt;code&gt;i2&lt;/code&gt; with &lt;code&gt;Int32&lt;/code&gt;. I copied the whole function, added a &lt;code&gt;_2&lt;/code&gt; suffix, and recompiled.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-objdump&quot;&gt;00000000000085d0 &amp;lt;mainGtU_2&amp;gt;:
    85d0:	48 89 d0             	mov    %rdx,%rax
    85d3:	4c 63 d6             	movslq %esi,%r10
    85d6:	48 89 ca             	mov    %rcx,%rdx
    85d9:	48 63 cf             	movslq %edi,%rcx
    85dc:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    85e0:	46 0f b6 14 10       	movzbl (%rax,%r10,1),%r10d
    85e5:	44 38 d1             	cmp    %r10b,%cl
    85e8:	74 0e                	je     85f8 &amp;lt;mainGtU_2+0x28&amp;gt;
    85ea:	41 38 ca             	cmp    %cl,%r10b
    85ed:	0f 92 c0             	setb   %al
    85f0:	c3                   	ret    
    85f1:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
    85f8:	8d 4f 01             	lea    0x1(%rdi),%ecx
    85fb:	48 63 c9             	movslq %ecx,%rcx
    85fe:	44 0f b6 14 08       	movzbl (%rax,%rcx,1),%r10d
    8603:	8d 4e 01             	lea    0x1(%rsi),%ecx
    8606:	48 63 c9             	movslq %ecx,%rcx
    8609:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    860d:	41 38 ca             	cmp    %cl,%r10b
    8610:	74 0e                	je     8620 &amp;lt;mainGtU_2+0x50&amp;gt;
    8612:	44 38 d1             	cmp    %r10b,%cl
    8615:	0f 92 c0             	setb   %al
    8618:	c3                   	ret    
    8619:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
    8620:	8d 4f 02             	lea    0x2(%rdi),%ecx
    8623:	48 63 c9             	movslq %ecx,%rcx
    8626:	44 0f b6 14 08       	movzbl (%rax,%rcx,1),%r10d
    862b:	8d 4e 02             	lea    0x2(%rsi),%ecx
    862e:	48 63 c9             	movslq %ecx,%rcx
    8631:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    8635:	41 38 ca             	cmp    %cl,%r10b
    8638:	75 d8                	jne    8612 &amp;lt;mainGtU_2+0x42&amp;gt;
    863a:	8d 4f 03             	lea    0x3(%rdi),%ecx
    863d:	48 63 c9             	movslq %ecx,%rcx
    8640:	44 0f b6 14 08       	movzbl (%rax,%rcx,1),%r10d
    8645:	8d 4e 03             	lea    0x3(%rsi),%ecx
    8648:	48 63 c9             	movslq %ecx,%rcx
    864b:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    864f:	41 38 ca             	cmp    %cl,%r10b
    8652:	75 be                	jne    8612 &amp;lt;mainGtU_2+0x42&amp;gt;
    8654:	8d 4f 04             	lea    0x4(%rdi),%ecx
    8657:	48 63 c9             	movslq %ecx,%rcx
    865a:	44 0f b6 14 08       	movzbl (%rax,%rcx,1),%r10d
    865f:	8d 4e 04             	lea    0x4(%rsi),%ecx
    8662:	48 63 c9             	movslq %ecx,%rcx
    8665:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    8669:	41 38 ca             	cmp    %cl,%r10b
    866c:	75 a4                	jne    8612 &amp;lt;mainGtU_2+0x42&amp;gt;
    866e:	8d 4f 05             	lea    0x5(%rdi),%ecx
    8671:	48 63 c9             	movslq %ecx,%rcx
    8674:	44 0f b6 14 08       	movzbl (%rax,%rcx,1),%r10d
    8679:	8d 4e 05             	lea    0x5(%rsi),%ecx
    867c:	48 63 c9             	movslq %ecx,%rcx
    867f:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    8683:	41 38 ca             	cmp    %cl,%r10b
    8686:	75 8a                	jne    8612 &amp;lt;mainGtU_2+0x42&amp;gt;
    8688:	8d 4f 06             	lea    0x6(%rdi),%ecx
    868b:	48 63 c9             	movslq %ecx,%rcx
    868e:	44 0f b6 14 08       	movzbl (%rax,%rcx,1),%r10d
    8693:	8d 4e 06             	lea    0x6(%rsi),%ecx
    8696:	48 63 c9             	movslq %ecx,%rcx
    8699:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    869d:	41 38 ca             	cmp    %cl,%r10b
    86a0:	0f 85 6c ff ff ff    	jne    8612 &amp;lt;mainGtU_2+0x42&amp;gt;
    86a6:	8d 4f 07             	lea    0x7(%rdi),%ecx
    86a9:	48 63 c9             	movslq %ecx,%rcx
    86ac:	44 0f b6 14 08       	movzbl (%rax,%rcx,1),%r10d
    86b1:	8d 4e 07             	lea    0x7(%rsi),%ecx
    86b4:	48 63 c9             	movslq %ecx,%rcx
    86b7:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    86bb:	41 38 ca             	cmp    %cl,%r10b
    86be:	0f 85 4e ff ff ff    	jne    8612 &amp;lt;mainGtU_2+0x42&amp;gt;
    86c4:	8d 4f 08             	lea    0x8(%rdi),%ecx
    86c7:	48 63 c9             	movslq %ecx,%rcx
    86ca:	44 0f b6 14 08       	movzbl (%rax,%rcx,1),%r10d
    86cf:	8d 4e 08             	lea    0x8(%rsi),%ecx
    86d2:	48 63 c9             	movslq %ecx,%rcx
    86d5:	0f b6 0c 08          	movzbl (%rax,%rcx,1),%ecx
    86d9:	41 38 ca             	cmp    %cl,%r10b
    86dc:	0f 85 30 ff ff ff    	jne    8612 &amp;lt;mainGtU_2+0x42&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The setup code of the two functions is a little different, but once we get going it looks something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-objdump&quot;&gt;              UInt32                          Int32
====================================================================
                                  | jne    8612 &amp;lt;mainGtU_2+0x42&amp;gt;
jne    6d0a &amp;lt;mainGtU.part.0+0x2a&amp;gt; | lea    0x8(%rdi),%ecx   
lea    0x8(%rdi),%edx             | movslq %ecx,%rcx   
lea    0x8(%rsi),%ecx             | movzbl (%rax,%rcx,1),%r10d   
movzbl (%rax,%rdx,1),%edx         | lea    0x8(%rsi),%ecx   
movzbl (%rax,%rcx,1),%ecx         | movslq %ecx,%rcx   
cmp    %cl,%dl                    | movzbl (%rax,%rcx,1),%ecx   
jne    6d0a &amp;lt;mainGtU.part.0+0x2a&amp;gt; | cmp    %cl,%r10b   
                                  | jne    8612 &amp;lt;mainGtU_2+0x42&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I&apos;m not sure if the &lt;code&gt;signed&lt;/code&gt; code is better or worse, although in terms of instruction count, it is definitely more.
Weird.&lt;/p&gt;
&lt;p&gt;I also tried to &lt;code&gt;CFLAGS=-march=native&lt;/code&gt; before building, thinking that maybe there&apos;s some platform specific code that we wanted the compiler to generate, but the code for the two functions seems to be identical with and without this flag.&lt;/p&gt;
&lt;/details&gt;
&lt;p&gt;There are cases where we are explicitly being made aware of a hidden tradeoff, and are given tools for dealing with it.
A good example of this is the C and C++&lt;sup&gt;&lt;a href=&quot;#user-content-fn-ccpp&quot; id=&quot;user-content-fnref-ccpp&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; flag &lt;code&gt;-ffast-math&lt;/code&gt;,
which enables a collections of other flags that relaxes some of the requirements of the &lt;a href=&quot;https://en.wikipedia.org/wiki/IEEE_754&quot;&gt;IEEE-754&lt;/a&gt; floating-point number standard.
One of the flags it sets, &lt;code&gt;-fassociative-math&lt;/code&gt;, allows the compiler to reorder the operation &lt;code&gt;(a + b) + c&lt;/code&gt; to &lt;code&gt;a + (b + c)&lt;/code&gt;&lt;sup&gt;&lt;a href=&quot;#user-content-fn-wtf&quot; id=&quot;user-content-fnref-wtf&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.
Another is &lt;code&gt;-freciprocal-math&lt;/code&gt;, which allows the compiler to consider &lt;code&gt;a / b&lt;/code&gt; the same as &lt;code&gt;a * (1 / b)&lt;/code&gt;.
In of itself these transformations are not so valuable, but in combination with common subexpression elimination or loop hoisting they can yield good speedups.
By specifying &lt;code&gt;-ffast-math&lt;/code&gt; we can allow the compiler to change the semantics of our programs (in a limited sense) such that it can make the output code faster.
However, we still need to know that this is a flag we can set.
If we don&apos;t konw about &lt;code&gt;-ffast-math&lt;/code&gt; and we don&apos;t mind these transformations&lt;sup&gt;&lt;a href=&quot;#user-content-fn-fastmath&quot; id=&quot;user-content-fnref-fastmath&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;7&lt;/a&gt;&lt;/sup&gt; we are inhibiting the compiler&apos;s ability to generate good code without gaining any benefit.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/John_Carmack&quot;&gt;John Carmack&apos;s&lt;/a&gt; conversation on Lex Fridman&apos;s &lt;a href=&quot;https://lexfridman.com/podcast/&quot;&gt;podcast&lt;/a&gt; contains a similar example of the idea that some of these trade-offs are very skewed.
The part &lt;a href=&quot;https://www.youtube.com/watch?v=I845O57ZSy4&amp;amp;t=2h59m34s&quot;&gt;can be heard here&lt;/a&gt;.
In talking about the innovations required to make Quake, and speficically about optimization, John says this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The most leverage comes from making the decisions that are a litte bit higher up, where you figure out how to change your large scale problem so that these lower level problems are easier to do, or it makes it possible to do them in a uniquely fast way.
&lt;br/&gt; --- John Carmack&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The changes John are talking about are slightly different than Chandler&apos;s because in John&apos;s case there is, say, a design decision with some wiggle room which can be used to yield huge benefits in terms of speeup.
Chandler&apos;s signedness case is an example of a decision that was accidentally made&lt;sup&gt;&lt;a href=&quot;#user-content-fn-accident&quot; id=&quot;user-content-fnref-accident&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;, maybe without knowing so and probably without knowing it&apos;s impact&lt;sup&gt;&lt;a href=&quot;#user-content-fn-impact&quot; id=&quot;user-content-fnref-impact&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;9&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;/post/merge/&quot;&gt;In an earlier post&lt;/a&gt; I explored the codegen of a &lt;code&gt;merge&lt;/code&gt; function&lt;sup&gt;&lt;a href=&quot;#user-content-fn-merge&quot; id=&quot;user-content-fnref-merge&quot; data-footnote-ref=&quot;&quot; aria-describedby=&quot;footnote-label&quot;&gt;10&lt;/a&gt;&lt;/sup&gt; and tried to have the compiler output branchless code with the conditional instruction &lt;code&gt;cmove&lt;/code&gt;.
By writing the function in slightly different ways, and eventually writing it straight in &lt;code&gt;x86&lt;/code&gt;, I got a total of seven variants of what I considered the same function.
The difference in time spent on a micro benchmark was 31ms for the slowest (the initial straight-forward way) to 19ms for the fastest (the asm, but two C-variants were also down there).
At the time I was happy with having beaten the compiler on optimizing such a small and simple function.
Now I&apos;m not so sure this is what happened, and I suspect that there are inputs which would make my trivial-and-slow implementation behave differently than any of the fast ones.
This would mean that the optimizer wasn&apos;t too stupid to get it right, but that I accidentally encoded behavior in the implementation that was too constraining for it to work around.
Behavior that I potentially didn&apos;t care about and that I would gladly give up for a 38% reduction in execution time.&lt;/p&gt;
&lt;p&gt;If we want our programs to be fast we clearly need to understand what our computers can do,
but we also need to understand what our programs are actually instructing the computer to do, and which constraints we are setting for an optimizing compiler.
We cannot simply out-source the job of generating fast machine code to the compiler, because we need to use the wiggle room of our design space in our advantage, and the compiler cannot do this.
Without working from both ends we may often find ourselves with terrible machine code that a simple and insignificant change to our code would have fixed.
By choosing to ignore this, we pay the price.&lt;/p&gt;
&lt;p&gt;Suggestions, comments, tips, and the signed bit of your integers can be sent to my &lt;a href=&quot;mailto:https://lists.sr.ht/~mht/public-inbox&quot;&gt;public inbox&lt;/a&gt; (plain text email only).&lt;/p&gt;
&lt;p&gt;Thanks for reading.&lt;/p&gt;
&lt;section data-footnotes=&quot;&quot; class=&quot;footnotes&quot;&gt;&lt;h2 id=&quot;footnote-label&quot; class=&quot;sr-only&quot;&gt;Footnotes&lt;/h2&gt;
&lt;ol&gt;
&lt;li id=&quot;user-content-fn-ub&quot;&gt;
&lt;p&gt;This is also where &lt;em&gt;undefined behavior&lt;/em&gt; comes in; the compiler is allowed to assume that UB does not happen, because a program execution in which is does happen is non-sensical.
If if can show that a variable having a certain value would cause UB it is allowed to assume that this variable will not have that value.
Depending on how offended you would be by being called a &amp;quot;language lawyer&amp;quot; you might find this argument to be non-sensical, but this is the status quo. &lt;a href=&quot;#user-content-fnref-ub&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-sema&quot;&gt;
&lt;p&gt;The numbers in this case were used for offsets from a base pointer. These should never be negative, and so by using &lt;code&gt;unsigned&lt;/code&gt; integers we can enforce this trait. &lt;a href=&quot;#user-content-fnref-sema&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-ubt&quot;&gt;
&lt;p&gt;I don&apos;t think Chandler is being very charitable in his guesswork here, but it &lt;em&gt;is&lt;/em&gt; a talk trying to disarm UB-fear so as a story telling device I guess it&apos;s ... fine? &lt;a href=&quot;#user-content-fnref-ubt&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-no&quot;&gt;
&lt;p&gt;Conversely, if we accidentally have huge positive offsets it is very likely that we &lt;code&gt;segfault&lt;/code&gt; at once, as the address of &lt;code&gt;block[2147483648]&lt;/code&gt; and higher is very likely not mapped. &lt;a href=&quot;#user-content-fnref-no&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-ccpp&quot;&gt;
&lt;p&gt;I assume many more languages have either the same flag or a similar flag with a different name. &lt;a href=&quot;#user-content-fnref-ccpp&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-wtf&quot;&gt;
&lt;p&gt;I wrote an blog post about some surprises of floating point numbers &lt;a href=&quot;https://mht.wtf/post/floating-precision/&quot;&gt;here&lt;/a&gt; where I also give an explicit example of non-associativity. &lt;a href=&quot;#user-content-fnref-wtf&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-fastmath&quot;&gt;
&lt;p&gt;There are good reasons for not using &lt;code&gt;-ffast-math&lt;/code&gt;; &lt;em&gt;very&lt;/em&gt; carefully written numerical code will often depend on the exact order of operations in order to avoid losing precision, dealing correctly with &lt;code&gt;NaN&lt;/code&gt;s, and so on. &lt;code&gt;-ffast-math&lt;/code&gt; throws all of this out the window. &lt;a href=&quot;#user-content-fnref-fastmath&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-accident&quot;&gt;
&lt;p&gt;Again, I&apos;m guessing here. &lt;a href=&quot;#user-content-fnref-accident&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-impact&quot;&gt;
&lt;p&gt;Now that we did the work and found that the assembly looked, if not worse, not better, it is worth questioning whether this decision really had any impact or not. Maybe this is the real lesson here? &lt;a href=&quot;#user-content-fnref-impact&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;user-content-fn-merge&quot;&gt;
&lt;p&gt;&lt;code&gt;merge&lt;/code&gt; takes two sorted lists and merges them together to one sorted list. Usually this is done by walking along the fronts of the two lists and poping the smaller of the two elements. &lt;a href=&quot;#user-content-fnref-merge&quot; data-footnote-backref=&quot;&quot; aria-label=&quot;Back to content&quot; class=&quot;data-footnote-backref&quot;&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content></entry><entry><title>The Mid-sphere Cousin of the Medial Axis Transform</title><id>https://mht.wtf/post/medial-ax/</id><updated>2025-11-02T22:36:35+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/medial-ax/" rel=""/><link href="https://mht.wtf/post/medial-ax/index.html" rel="alternate"/><published>2025-11-02T22:36:35+02:00</published><content type="text/html">&lt;p&gt;&lt;a href=&quot;https://pub.ista.ac.at/~edels/&quot;&gt;Herbert Edelsbrunner&lt;/a&gt;, &lt;a href=&quot;https://elizabethrstephenson.com&quot;&gt;Elizabeth Stephenson&lt;/a&gt;, and myself
had &lt;a href=&quot;https://doi.org/10.1007/978-3-032-09544-2_10&quot;&gt;our paper&lt;/a&gt; published at &lt;a href=&quot;https://www.cs.rug.nl/svcg/DGMM2025&quot;&gt;DGMM 2025&lt;/a&gt;!
Elizabeth is going to Groningen to present it this week, which is very exciting.
A slightly old version of the paper is available on &lt;a href=&quot;https://arxiv.org/abs/2504.14743&quot;&gt;arXiv&lt;/a&gt;.
We&apos;ve also built an interactive editor for the project, which is &lt;a href=&quot;https://medial-ax.github.io/medial-ax/&quot;&gt;hosted here&lt;/a&gt; and is &lt;a href=&quot;https://github.com/medial-ax/medial-ax&quot;&gt;open source&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The paper is my first publication, which is funny since I left adacemia over three years ago.
Being a published author also means I get an &lt;a href=&quot;https://en.wikipedia.org/wiki/Erd%C5%91s_number&quot;&gt;Erdős number&lt;/a&gt;, and mine is &lt;strong&gt;3&lt;/strong&gt; going from me, to Herbert, to &lt;a href=&quot;https://users.renyi.hu/~pach/&quot;&gt;János Pach&lt;/a&gt;, to Paul Erdős.&lt;/p&gt;
&lt;p&gt;Yay!&lt;/p&gt;
</content></entry><entry><title>rss</title><id>https://mht.wtf/post/rss/</id><updated>2025-04-19T16:19:00+02:00</updated><author><name>Martin Hafskjold Thoresen</name><email>m@mht.wtf</email></author><link href="https://mht.wtf/post/rss/" rel=""/><link href="https://mht.wtf/post/rss/index.html" rel="alternate"/><published>2025-04-19T16:19:00+02:00</published><content type="text/html">&lt;p&gt;&lt;code&gt;rss&lt;/code&gt; is the latest installment in my series of small bespoke services
I have written for myself.  It looks like this:&lt;/p&gt;
&lt;figure style=&quot;display: flex; justify-content: center&quot;&gt;
  &lt;div style=&quot;max-width: 400px&quot;&gt;
    &lt;img src=&quot;./rss.png&quot; style=&quot;flex: 1; width: 100%&quot;&gt;
    &lt;figcaption&gt;&lt;code&gt;rss&lt;/code&gt; is a list of recent feed items.&lt;/figcaption&gt;
  &lt;/div&gt;
&lt;/figure&gt;
&lt;p&gt;The front page shows the 20 most recent items from ~70 feeds.
Items are marked as read/unread shown by the blue dot to the left.
This is clickable, and toggles between the two states.&lt;/p&gt;
&lt;p&gt;The feed title is also clickable which takes you to the feed detail view.
This lists recent items for that feed, and it shows some data for the feed, like the URL.
I can also set an alias for the feed, which is then used instead of the feed title;
for instance, the title of Chris Wellon&apos;s feed is &lt;code&gt;&amp;quot;null program&amp;quot;&lt;/code&gt;, and I aliased it to &lt;code&gt;&amp;quot;Chris Wellons&amp;quot;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The stack is the same as always: &lt;a href=&quot;https://htmx.org/&quot;&gt;&lt;code&gt;htmx&lt;/code&gt;&lt;/a&gt; for interactivity, &lt;a href=&quot;https://maud.lambda.xyz/&quot;&gt;&lt;code&gt;maud&lt;/code&gt;&lt;/a&gt; for templating, and
&lt;a href=&quot;https://sqlite.org/index.html&quot;&gt;&lt;code&gt;sqlite&lt;/code&gt;&lt;/a&gt; for storage.  Built and deployed in &lt;a href=&quot;https://www.docker.com/&quot;&gt;&lt;code&gt;docker&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I started fetching real feeds early on, and almost immediately I got 429&apos;d by &lt;a href=&quot;https://rachelbythebay.com/&quot;&gt;rachelbythebay&lt;/a&gt;&apos;s feed.
I was kinda aware of her writing on shitty RSS feed bot behavior, and all of a sudden I was part of the problem.
Sorry, Rachel! (and others, whose bandwidth I wasted)
Now I send both &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/ETag&quot;&gt;&lt;code&gt;ETag&lt;/code&gt;&lt;/a&gt;s and &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/If-Modified-Since&quot;&gt;&lt;code&gt;If-Modified-Since&lt;/code&gt;&lt;/a&gt;s,
and poll once a day at a random time in between 01:00 and 07:00. Well, that was
my intention anyways, but seems I got tricked by timezones, so the polling
times is in UTC, whereas I probably wanted the times to be local time, since
I&apos;m usually asleep then, but probably awake before 09:00.&lt;/p&gt;
&lt;p&gt;I have not changed the &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/User-Agent&quot;&gt;&lt;code&gt;User-Agent&lt;/code&gt;&lt;/a&gt;, so whatever &lt;a href=&quot;https://docs.rs/ureq/latest/ureq/&quot;&gt;&lt;code&gt;ureq&lt;/code&gt;&lt;/a&gt; does by default is what I&apos;m sending.
I assume I&apos;m supposed to set some kind of unique name for the reader and some contact info, but am not sure of exactly what the format is like.
MDN contains this example:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-user-agent&quot;&gt;Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I assume there&apos;s some common format for this, but haven&apos;t looked more into it yet.&lt;/p&gt;
&lt;p&gt;Some things that need improvement:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Datetime parsing: Not all feeds follow what seems to be the spec-mandated datetime format. One feed uses &lt;code&gt;&amp;quot;Wed, 09 Apr 2025 04:48:38 UTC&amp;quot;&lt;/code&gt;, and these entries are simply ignored for now.  Should be possible to add in some exceptions to the datetime format as these crop up.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I use &lt;code&gt;tokio::spawn&lt;/code&gt; for tasks that awaits &lt;code&gt;sleep_until&lt;/code&gt; and then fetches the feed again, and then it basically recurses. This works, but I&apos;m flying blind since there&apos;s no apparent way of confirming that all tasks are still active.  If a task panics (for instance by having the request time out), it will not be re-ran.  I put some logging statements around this part of the code so that I can check what&apos;s going on from the logs, but with around 70 feeds this is a fair amount of spam.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Some missing nice-to-have functionality, like deleting feed items or removing a feed.  Deleting items is useful for feeds that change the entry ids all of a sudden.  &lt;a href=&quot;https://blog.rust-lang.org/feed.xml&quot;&gt;blog.rust-lang.org&lt;/a&gt; changed its format from &lt;br /&gt;
&lt;code&gt;https://blog.rust-lang.org/2025/04/08/Project-Goals-2025-March-Update.html&lt;/code&gt; to&lt;br /&gt;
&lt;code&gt;https://blog.rust-lang.org/2025/04/08/Project-Goals-2025-March-Update/&lt;/code&gt;, causing double entries in my feed.  Would be nice to delete the double entry, but items are quickly bumped off of the front page anyways.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;No error checking on the htmx side, so nothing happens if an endpoint returns a 429 or 500.  Having something here would make life slightly easier, for instance all those times where I accidentally paste in the URL to a blog instead of to its feed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Like with &lt;a href=&quot;/post/ppl/&quot;&gt;&lt;code&gt;ppl&lt;/code&gt;&lt;/a&gt; I&apos;m happy with the result, and unlike &lt;code&gt;ppl&lt;/code&gt;, I use &lt;code&gt;rss&lt;/code&gt; basically every day.
It&apos;s fun using something you&apos;ve made yourself, for yourself!&lt;/p&gt;
</content></entry></feed>