Stable Timings

February 05, 2018 back to posts

It was friday afternoon and I was working on a proof-of-concept garbage collector for Rust. The general idea of the collector is to have threads register their roots --- pointers to memory that the collector cares about --- and every once in a while a thread collects these roots, fork off a new process, find memory that is no longer reachable from any root, and return these pointers back to the parent process using mmap.

The initial implementation was obviously not well tuned for performance. In order to get a rough picture of where my system spent its cycles, I inserted calls to time::precise_time_ns before and after certain blocks, and wrote out the difference at the end. The reported timings looked like this:

consolidate wait for signals 0ms
               collect roots 0ms
                        fork 0ms
              wait for child 67.108864ms
                   read ptrs 0ms
                   call free 201.3266ms

consolidate wait for signals 0ms
               collect roots 0ms
                        fork 0ms
              wait for child 67.108864ms
                   read ptrs 0ms
                   call free 134.21773ms

consolidate wait for signals 0ms
               collect roots 0ms
                        fork 0ms
              wait for child 67.108864ms
                   read ptrs 0ms
                   call free 134.21773ms

Strange. Don't mind the fact that several of the timings are 0 --- zero --- ms, but wait for child is the exact same number across the three iterations shown. And call free is the exact same the last two iterations. And what about the fact that 67.1 + 134.2 = 201.3?? Something is obviously wrong here. Maybe there is some weird OS synchronization going on with fork and/or mmap¹? Or maybe the numbers just are very stable since I'm using a test program that does the exact same thing a billion times? (This could be the case, as the number of pointers returned were in fact the exact same number on all iterations)

It was friday afternoon so I commited and went home.

The day after, I found myself programming on a side project, inserting the very same timing calls. Again, I got back very strange numbers. It was then I realized what was happening.

My code looked roughly like this:

let t0 = time::precise_time_ns();
do_some_work();
let t1 = time::precise_time_ns();
....

println!("do_some_work: {}ns", (t1 as f32 - t0 as f32) / 1_000_000.0);

time::precise_time_ns returns a u64. As I'm writing this, it returns 659_950_597_875_582. That's a large number. That number is way larger than the largest number a f32 can properly represent. By just casting back and forth, we clearly see this:

extern crate time;
fn main() {
    let t = time::precise_time_ns();
    println!("{}", t);                // prints 660_149_119_845_010
    println!("{}", t as f32 as u64);  // prints 660_149_089_861_632
}

So since I casted to f32 before doing the subtraction, time differences in the order of 29_983_378ns², 29ms, would simply disappear, causing the reported difference to be zero. And the larger differences would be truncated to the highest multiple of ~33. This explains it all!

Now, why did I cast before subtracting? I don't know. Why did I use f32 instead of f64 (which in this case would be sufficient)? I don't know. It was just one of those times when you write quick and dirty code without really thinking about the details of what you're doing. Luckily, this time the bug was fairly easy to spot.

I mean, it could obviously not be anything wrong with my code! ↩
Finding the exact bound is left as an exercise to the reader. ↩

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

Stable Timings

Footnotes