Stable Timings
It was friday afternoon and I was working on a proof-of-concept garbagecollector for Rust. The general idea of the collector is to have threads
register their roots --- pointers to memory that the collector cares about ---
and every once in a while a thread collects these roots, fork off a new process,
find memory that is no longer reachable from any root, and return these pointers
back to the parent process using mmap
.
The initial implementation was obviously not well tuned for performance.
In order to get a rough picture of where my system spent its cycles, I
inserted calls to time::precise_time_ns
before and after certain blocks, and
wrote out the difference at the end. The reported timings looked like this:
consolidate wait for signals 0ms
collect roots 0ms
fork 0ms
wait for child 67.108864ms
read ptrs 0ms
call free 201.3266ms
consolidate wait for signals 0ms
collect roots 0ms
fork 0ms
wait for child 67.108864ms
read ptrs 0ms
call free 134.21773ms
consolidate wait for signals 0ms
collect roots 0ms
fork 0ms
wait for child 67.108864ms
read ptrs 0ms
call free 134.21773ms
Strange. Don't mind the fact that several of the timings are 0 --- zero --- ms,
but wait for child
is the exact same number across the three iterations
shown. And call free
is the exact same the last two iterations. And what
about the fact that 67.1 + 134.2 = 201.3
?? Something is obviously wrong here.
Maybe there is some weird OS synchronization going on with fork
and/or
mmap
1? Or maybe the numbers just are very stable since I'm using
a test program that does the exact same thing a billion times? (This could be
the case, as the number of pointers returned were in fact the exact same
number on all iterations)
It was friday afternoon so I commited and went home.
The day after, I found myself programming on a side project, inserting the very same timing calls. Again, I got back very strange numbers. It was then I realized what was happening.
My code looked roughly like this:
let t0 = time::precise_time_ns();
do_some_work();
let t1 = time::precise_time_ns();
....
println!("do_some_work: {}ns", (t1 as f32 - t0 as f32) / 1_000_000.0);
time::precise_time_ns
returns a u64
. As I'm writing this, it returns
659_950_597_875_582
. That's a large number. That number is way larger than the
largest number a f32
can properly represent. By just casting back and forth,
we clearly see this:
extern crate time;
fn main() {
let t = time::precise_time_ns();
println!("{}", t); // prints 660_149_119_845_010
println!("{}", t as f32 as u64); // prints 660_149_089_861_632
}
So since I casted to f32
before doing the subtraction, time differences in
the order of 29_983_378ns
2, 29ms
, would simply disappear, causing the
reported difference to be zero. And the larger differences would be truncated to
the highest multiple of ~33
. This explains it all!
Now, why did I cast before subtracting? I don't know. Why did I use f32
instead of f64
(which in this case would be sufficient)? I don't know. It was
just one of those times when you write quick and dirty code without really
thinking about the details of what you're doing. Luckily, this time the bug was
fairly easy to spot.
Footnotes
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License