Ideas for a Programming Language
It seems that many of the widely used languages today are designed in a top-down mentality, in which we figure out what we want to express in the language, and then we figure out how to lower it to machine instructions, whatever they might be.
I've been wondering if this is because most of the C-like languages are very similar, or if the primitives one ends up with starting from the bottom look a lot like what we already have.
The End
We can start at the bottom, or the end of the lifetime of a program, namely its execution.
This is platform- and architecture-dependent, so let's fixate on x86_64 linux
for now. The program is an ELF binary. Anders Sundmans blog post A Simple
ELF shows how much stuff is in a ELF binary resulting from a
hello-world
C program. One of the things Anders did to reduce the amount of
stuff in the binary was to send a bunch (eight, -o
doesn't count) flags to
gcc
to turn off things like PIE and stack protection. These things aren't
contained in the C source, because they're outside C's "domain".
It's weird that very basic things about the final program, like PIE, is not controllable in the program text. I think this is because C at this point is heavily invested in the abstract-machine model of a programming language, in which the language targets a hypothetical machine that runs the program, and implementors (i.e. compiler authors and adjacent programmers) have to write stuff that allows this hypothetical to become real. I.e. an ELF binary with machine instructions.
-
We have
const
andmut
, which is really only a write bitw
. What about a read bitr
? This would be very natural for output parameters, for instance.1
Including functionality found in build systems (like optimization levels) in the source text allows for fine control of them.
Procedures
Procedures are useful, but it is curious how languages always use the call-return pattern.
At the call site we jump to the function, and the function returns to us.
Few languages have a natural way of expressing jumps in the same way, which I suppose would be
like a function call, except that it's not supposed to return.
The computer has no problem with this, of course, this is a jump
(but with arguments).
Coroutines are related.
They are like a function, except that they yield control elsewhere, for some other code to continue at a later point.
Some languages have adopted coroutines, or something very similar:
Python's yield
is an example,
and async
programming is implemented in terms of coroutines in Rust.
Control Flow
The language could support multple stacks, and continuing executing a stack from its address by jumping to it and reading the PC from it. Maybe you could choose between going up your stack or jumping back to the stack that jumped to you. Popping the last frame of the stack means that you're done. This seems to support four different control flow modes for procedures:
- traditional procedure: push frame on stack.
- jump and return: jump to new stack, store previous location, return to that when done.
- jump and stay("latch"-function?): jump to new stack, and ignore the source address.
- create stack and jump: effectively call-and-
exit
Aren't even other forms of control flow, if
and for
and friends, examples of this?
Jumps while reusing the stack?
Slots
The primitive data types should correspond to what can go in a CPU register. b32 b64 b128 b256 b512.
For less-widely used features, like AVX-512, the compiler will have to do what's necessary to emulate using e.g. AVX2.
Scratch
quicksort
buffer is address
len is u32
if len = 0:
return
pivot = random len // [0, len)
pv = buffer[4 * pivot]
buffer[4 * pivot] = buffer[4 * (len - 1)]
buffer[4 * (len - 1)] = pv
front = buffer
back = buffer + 4 * (len - 2) // pivot is at len - 1
while front < back:
x = front[0]
if pv < x:
back[0] = x
back -= 4
else
front += 4
a = (front - buffer) / 4
b = (len - a) / 4
quicksort buffer a
quicksort back b
A Small Program
Let's try to write something pretty straight forward and see what the assembly can look like. This is Project Euler problem #2.
;; Setup
mov sum, 0
mov prev, 1
mov curr, 1
;; Loop
loop:
mov next, curr
add next, prev
;; Check if we're done
cmp next, 4_000_001
jg .end
;; Conditional add
mov tmp, 0
test next, 1
cmove tmp, next
add sum, tmp
;; Shuffle things around
mov prev, curr
mov curr, next
jump .loop
end:
Which roughly corresponds to something you could get out by compiling this C function:
int fib() {
int sum = 0;
int prev = 1;
int curr = 1;
do {
int next = curr + prev;
if (next >= 4000001) break;
if ((next & 1) == 0) sum += next;
prev = curr;
curr = next;
} while (true);
return sum;
}
Footnotes
-
I've seen
in
,out
, andinout
in some experimental C++ presentation at some point. Presumably this is basicallyr
,w
andrw
. ↩