mht.wtf

Computer science, programming, and whatnot.

Ideas for a Programming Language

It seems that many of the widely used languages today are designed in a top-down mentality, in which we figure out what we want to express in the language, and then we figure out how to lower it to machine instructions, whatever they might be.

I've been wondering if this is because most of the C-like languages are very similar, or if the primitives one ends up with starting from the bottom look a lot like what we already have.

The End

We can start at the bottom, or the end of the lifetime of a program, namely its execution. This is platform- and architecture-dependent, so let's fixate on x86_64 linux for now. The program is an ELF binary. Anders Sundmans blog post A Simple ELF shows how much stuff is in a ELF binary resulting from a hello-world C program. One of the things Anders did to reduce the amount of stuff in the binary was to send a bunch (eight, -o doesn't count) flags to gcc to turn off things like PIE and stack protection. These things aren't contained in the C source, because they're outside C's "domain".

It's weird that very basic things about the final program, like PIE, is not controllable in the program text. I think this is because C at this point is heavily invested in the abstract-machine model of a programming language, in which the language targets a hypothetical machine that runs the program, and implementors (i.e. compiler authors and adjacent programmers) have to write stuff that allows this hypothetical to become real. I.e. an ELF binary with machine instructions.

Including functionality found in build systems (like optimization levels) in the source text allows for fine control of them.

Procedures

Procedures are useful, but it is curious how languages always use the call-return pattern. At the call site we jump to the function, and the function returns to us. Few languages have a natural way of expressing jumps in the same way, which I suppose would be like a function call, except that it's not supposed to return. The computer has no problem with this, of course, this is a jump (but with arguments).

Coroutines are related. They are like a function, except that they yield control elsewhere, for some other code to continue at a later point. Some languages have adopted coroutines, or something very similar: Python's yield is an example, and async programming is implemented in terms of coroutines in Rust.

Control Flow

The language could support multple stacks, and continuing executing a stack from its address by jumping to it and reading the PC from it. Maybe you could choose between going up your stack or jumping back to the stack that jumped to you. Popping the last frame of the stack means that you're done. This seems to support four different control flow modes for procedures:

Aren't even other forms of control flow, if and for and friends, examples of this? Jumps while reusing the stack?

Slots

The primitive data types should correspond to what can go in a CPU register. b32 b64 b128 b256 b512.

For less-widely used features, like AVX-512, the compiler will have to do what's necessary to emulate using e.g. AVX2.

Scratch

quicksort
    buffer is address
    len is u32

    if len = 0:
        return

    pivot = random len // [0, len)
    pv  = buffer[4 * pivot]
    buffer[4 * pivot] = buffer[4 * (len - 1)]
    buffer[4 * (len - 1)] = pv
    
    front = buffer
    back = buffer + 4 * (len - 2) // pivot is at len - 1 

    while front < back:
        x = front[0]
        if pv < x:
            back[0] = x
            back -= 4
        else
            front += 4

    a = (front - buffer) / 4
    b = (len - a) / 4
    quicksort buffer a
    quicksort back b

A Small Program

Let's try to write something pretty straight forward and see what the assembly can look like. This is Project Euler problem #2.

;; Setup
mov    sum, 0
mov    prev, 1
mov    curr, 1
;; Loop
loop:
mov    next, curr
add    next, prev
;; Check if we're done
cmp    next, 4_000_001
jg     .end
;; Conditional add
mov    tmp, 0
test   next, 1
cmove  tmp, next
add    sum, tmp
;; Shuffle things around
mov    prev, curr
mov    curr, next
jump   .loop
end:

Which roughly corresponds to something you could get out by compiling this C function:

int fib() {
    int sum = 0;
    int prev = 1;
    int curr = 1;

    do {
        int next = curr + prev;
        if (next >= 4000001) break;

        if ((next & 1) == 0) sum += next;
        prev = curr;
        curr = next;
    } while (true);

    return sum;
}

Footnotes

  1. I've seen in, out, and inout in some experimental C++ presentation at some point. Presumably this is basically r, w and rw.