In this post, I walk you through how Valgrind actually works under the hood, the strange low-level hacks I uncovered along the way, as well as my experience porting some of that code into Rust.
Recently, Felix and I have been exploring the idea of a new tool1 to detect undefined behavior in Rust code. This new tool is meant to co-exist alongside Miri, in case where Miri is not possible to use, such as when linking to arbitrary C libraries.
Note: This entire project is still in the exploration phase; it's possible we might discard this approach and try something different depending on how it performs, or on feedback from others working on the compiler and Rust's operational semantics.
The first step in our journey was to come up with a basic architecture and proof-of-concept, and we settled on a solution involving Valgrind. We would write a new Valgrind tool to track certain information about a Rust program as it's running, such as what data is being borrowed, what data is being dereferenced, etc. In case you're not familiar with Valgrind, it's a framework for dynamic analysis tools; it provides:
Felix created a C demo program which is able to talk to a custom Valgrind tool (
From there, I created an equivalent Rust demo program, which involved having to make sense of Valgrind's source and translate some of that as well (which is when things get hairy).
For our demo, we decided to keep in mind Stacked Borrows, an operational semantics that is integrated into the Rust compiler, and see if we could expose to our Valgrind tool the sort of information that SB might need.
In Felix's demo, we have this code:
char val = 1; char *x = ; // x = &mut val;
The idea is to model what it's like to get an exclusive reference in Rust; the
KC_BORROW_MUT macro takes the value, and returns a reference to it.
When the binary is run under Valgrind using our new tool and the second line is executed, the tool receives a "client request".
Otherwise, when the binary is run standalone, the Valgrind-related logic is a no-op.
Here's the output originating from our Valgrind tool:
kc_handle_client_request, handle BORROW_MUT 1ffeffffee 91 92 93 94
KC_BORROW_MUT macros is a utility that Felix created to call into the Valgrind-provided macros.
Let's take a look at the definition:
// encodes &mut PLACE, supplying pointee type first to placate cc
VG_USERREQ__BORROW_MUT is just a numeric constant (equal to
0x1ffeffffee), so I've omitted it for brevity.
Let's try to understand what is happening here.
This macro accepts a type and value and calls the
VALGRIND_DO_CLIENT_REQUEST_EXPR macro, passing in:
The remaining parameters won't be used for our demo, but I changed them to unique numbers for debugging purposes. We also cast the result to a pointer of the type that was passed in.
Now let's take a look at the definition of the
This is where things get really interesting.
Note: I'm focusing on x86-64, aka AMD64.
/* The following defines the magic code sequences which the JITter spots and handles magically. Don't look too closely at them as they will rot your brain. */
The first thing to note is the comment 😂 which insipired the title of this post.
VALGRIND_DO_CLIENT_REQUEST_EXPR macro creates an array with 6 slots, and a variable to hold the result to return at the end.
We then take all the parameters that are passed into the macro, and stuff them into the array.
Lastly, we run some inline assembly.
The first instructions that execute are the ones in the previous macro,
rolq $3, %rdi rolq $13, %rdi rolq $61, %rdi rolq $51, %rdi
This is using AT&T syntax, and the easiest way to tell2 is because constants are prefixed with
$, and registers with
rolq instructions means rotate left. So, these 4 instructions collectively rotate the
rdi register 128 bits, to the left, meaning the entire number is fully rotated twice (since we are dealing with a 64-bit architecture). The only purpose of these instructions is to signal to Valgrind that we are about to include a client request in the next assembly instruction.
And... this would be a good time to share a diagram. What happens when you run a binary under Valgrind? Valgrind has its own JIT compiler named VEX, which the binary is run under. Each assembly instruction is converted to VEX IR before being executed. This is so differences between all architectures supported by Valgrind can be abstracted away.
sequenceDiagram participant jit as VEX JITter participant tool as Custom Valgrind Tool participant term as Terminal loop each assembly instruction jit->>jit: dissasemble into VEX IR alt is magic code sequence for VALGRIND_DO_CLIENT_REQUEST jit->>tool: handle client request tool->>tool: update internal state opt tool->>term: Print to stdout end else is regular code sequence jit->>jit: execute IR end end
Each time the
__SPECIAL_INSTRUCTION_PREAMBLE is encountered by the JITter, it has to check the following instruction, because there are 3 different actions that can be taken, one of which is to make the Client Request. For the Client Request, the instruction is
xchgq %rbx, %rbx
This exchanges the contents of the operands, and normally, two different registers would be listed, but this way, it's just a no-op when running not under Valgrind.
After the assembly come the operands, first the outputs, then the inputs. This is how you conect the variables in C the registers in assembly.
"=d" (_zzq_result)is an input operand that says to overwrite (
=) the variable
_zzq_resultwith the contents of register
d) after the assembly is executed
"a" (&_zzq_args)is an output operand that says to place the contents of
&_zzq_args(the address of the start of the array) in register
a) before the assembly is executed
"0" (_zzq_default)means to place the
_zzq_defaultvariable into the output operand at index 0 (
0) of the list of input operands, which is
So, before the inline assembly is executed, we'll have this
&_zzq_args -> rax _zzq_default -> rdx
...and after the inline assembly is executed:
rdx -> _zzq_result
You can see how this would allow the code to work when not run under Valgrind; all the assembly instructions would be no-ops, and the default would simple go into and out of
For the last line, we have two "clobbers"3:
ccindicates that the flags register was modified
memoryindicates that the assembly reads or writes to other items other than the ones listed in the operands section. I'm not quite sure why this is needed here, since as far as I can tell only
Now that we understand everything that happens in the inline assembly, we can take a look at some of the code in the demo Valgrind tool to see the whole thing come together:
This is the function that is called when the JITter comes upon a Client Request. The relevant parameters are:
arg: points to the start of the parameter array (
_zzq_args) that we populated before the inline assembly
ret: points to where we need to set the return value (
After checking that we indeed are receiving a request code for something that falls under the purview of this specific Valgrind tool, we
switch on that request code to see which of the various types of requests we want to make.
For our demo purposes, Felix just made the code print out the parameters and set the first item in the array as the return value (which if you look all the way back at the
KC_BORROW_MUT macro, is the address of the
At this point, we have a good grasp on both the C and the inline assembly, and it's time for some Rust!
First, translating the
KC_BORROW_MUT macro is straightforward.
Why did I make this a macro and not a regular function? Because I want to directly get the address of the variable; I don't want the address of the function argument, if it were to be passed into a function.
Next, we handle the
I created the
Data struct so that I could set
arg1 to be a raw pointer instead having to have the entire array be
u64, like it was in the C code.
Now, let's talk through the Rust inline assembly. The five instructions should be familiar, though they look a bit different.
rol rdi, 3 rol rdi, 13 rol rdi, 61 rol rdi, 51 xchg rbx, rbx
Here I opted to use Intel syntax, which these days seems to be the syntax of choice for many, especially for its improved readability. In fact, I came across this HN comment, regarding the choice of having Intel syntax be default for inline asembly in Rust:
Yes, thank you for the Intel syntax. It is by far the most frequent syntax you'll find anywhere somewhat recent (1990+) and it is by far the most straightforward to learn by being clean. I did learn both in the 90s, but I've always struggled with AT&T symbols and different offerings which to me was hard to switch to when learning from different material sources.
In Intel syntax, the operands are flipped. Also, you don't have to escape the constants and registers.
Next, we come to the operands section.
According to The Rust Reference, the
inout syntax looks like
inout(<reg>) <in expr> => <out expr>
You should be able to include an expression that is put into register
reg at the start, with the contents of the register then being put into
out expr at the end.
Unfortunately, I wasn't able to get this to work (probably PEBCAK).
inout("di") zzq_default => zzq_result just did not seem to populate the register with
zzq_default at the start, so what I ended up doing (which I think might actually be cleaner) was to set
zzq_result to have the default value from the beginning, that way if the binary is run standalone, we already have
zzq_result set with the correct value to return.
The "in" operand was pretty straightforward, looking very similar to the C inline assembly.
Here's the Rust code where we use the macro:
let mut val: u8 = 1; let x = kc_borrow_mut!; // x = &mut val;
All in all, what took me the longest, was figuring out what
"0" (_zzq_default) meant in the C inline assembly.
I also struggled with the question of why
rax are used, as opposed to some other registers.
As far I understand, this is chosen mostly because of conventions, but there is nothing stopping you from using other ones.
The biggest difference between C inline assembly and Rust inline assembly was understanding the operands section.
cc clobber didn't seem to be needed, as according to The Rust Reference, if the flags register is not modified, then you have to explicitly set the
Lastly, I didn't quite figure out how to express the equivalent of the
The Rust Reference does mention the
clobber keyword in multiple spots, but it seems that you have to specify which registers are being clobbered, contrary to what
I hope this doesn't come back to bite us later, and in fact, I probably have to get to the bottom of
memory once we proceed further with our project.
If we can communicate with Valgrind by using explicit macros in the code, we should be able to integrate this approach into the compiler, and insert the assembly during compilation (based on a compiler flag), unlocking the possibility for exciting new dynamic analysis tools for Rust that are tightly integrated with the rustc itself.
There's more work that would need to be done to support this, mostly in the MIR layer of the compiler, but given that our Rust demo is a success, we're off to a good start. As Felix and I continue our exploration of this dynamic analysis space, there'll be more interesting topics to uncover, which I'll document on this blog :)
If you made it this far, you are probably itching to see the full code:
Thanks to Nicholas Nethercote for his help in deciphering some of the Valgrind code and pointing me in the right direction, Felix for writing the initial C code which formed the basis for this entire journey, as well as Jack and Dillon for reading drafts of this post.
See Niko's mention of dynamic verification
The faker's guide to reading (x86) assembly language served as a good refresher on assembly
See the GNU Extended Asm docs for a full explanation of clobbers