In this post, I walk you through how Valgrind actually works under the hood, the strange low-level hacks I uncovered along the way, as well as my experience porting some of that code into Rust.
Recently, Felix and I have been exploring the idea of a new tool1 to detect undefined behavior in Rust code. This new tool is meant to co-exist alongside Miri, in case where Miri is not possible to use, such as when linking to arbitrary C libraries.
Note: This entire project is still in the exploration phase; it's possible we might discard this approach and try something different depending on how it performs, or on feedback from others working on the compiler and Rust's operational semantics.
The first step in our journey was to come up with a basic architecture and proof-of-concept, and we settled on a solution involving Valgrind. We would write a new Valgrind tool to track certain information about a Rust program as it's running, such as what data is being borrowed, what data is being dereferenced, etc. In case you're not familiar with Valgrind, it's a framework for dynamic analysis tools; it provides:
Felix created a C demo program which is able to talk to a custom Valgrind tool (krabcake
).
From there, I created an equivalent Rust demo program, which involved having to make sense of Valgrind's source and translate some of that as well (which is when things get hairy).
For our demo, we decided to keep in mind Stacked Borrows, an operational semantics that is integrated into the Rust compiler, and see if we could expose to our Valgrind tool the sort of information that SB might need.
In Felix's demo, we have this code:
char val = 1;
char *x = ; // x = &mut val;
The idea is to model what it's like to get an exclusive reference in Rust; the KC_BORROW_MUT
macro takes the value, and returns a reference to it.
When the binary is run under Valgrind using our new tool and the second line is executed, the tool receives a "client request".
Otherwise, when the binary is run standalone, the Valgrind-related logic is a no-op.
Here's the output originating from our Valgrind tool:
kc_handle_client_request, handle BORROW_MUT 1ffeffffee 91 92 93 94
The KC_BORROW_MUT
macros is a utility that Felix created to call into the Valgrind-provided macros.
Let's take a look at the definition:
// encodes &mut PLACE, supplying pointee type first to placate cc
VG_USERREQ__BORROW_MUT
is just a numeric constant (equal to 0x1ffeffffee
), so I've omitted it for brevity.
Let's try to understand what is happening here.
This macro accepts a type and value and calls the VALGRIND_DO_CLIENT_REQUEST_EXPR
macro, passing in:
The remaining parameters won't be used for our demo, but I changed them to unique numbers for debugging purposes. We also cast the result to a pointer of the type that was passed in.
Now let's take a look at the definition of the VALGRIND_DO_CLIENT_REQUEST_EXPR
macro.
This is where things get really interesting.
Note: I'm focusing on x86-64, aka AMD64.
/* The following defines the magic code sequences which the JITter
spots and handles magically. Don't look too closely at them as
they will rot your brain. */
The first thing to note is the comment 😂 which insipired the title of this post.
The VALGRIND_DO_CLIENT_REQUEST_EXPR
macro creates an array with 6 slots, and a variable to hold the result to return at the end.
We then take all the parameters that are passed into the macro, and stuff them into the array.
Lastly, we run some inline assembly.
The first instructions that execute are the ones in the previous macro, __SPECIAL_INSTRUCTION_PREAMBLE
.
rolq $3, %rdi
rolq $13, %rdi
rolq $61, %rdi
rolq $51, %rdi
This is using AT&T syntax, and the easiest way to tell2 is because constants are prefixed with $
, and registers with %
.
The rolq
instructions means rotate left. So, these 4 instructions collectively rotate the rdi
register 128 bits, to the left, meaning the entire number is fully rotated twice (since we are dealing with a 64-bit architecture). The only purpose of these instructions is to signal to Valgrind that we are about to include a client request in the next assembly instruction.
And... this would be a good time to share a diagram. What happens when you run a binary under Valgrind? Valgrind has its own JIT compiler named VEX, which the binary is run under. Each assembly instruction is converted to VEX IR before being executed. This is so differences between all architectures supported by Valgrind can be abstracted away.
sequenceDiagram participant jit as VEX JITter participant tool as Custom Valgrind Tool participant term as Terminal loop each assembly instruction jit->>jit: dissasemble into VEX IR alt is magic code sequence for VALGRIND_DO_CLIENT_REQUEST jit->>tool: handle client request tool->>tool: update internal state opt tool->>term: Print to stdout end else is regular code sequence jit->>jit: execute IR end end
Each time the __SPECIAL_INSTRUCTION_PREAMBLE
is encountered by the JITter, it has to check the following instruction, because there are 3 different actions that can be taken, one of which is to make the Client Request. For the Client Request, the instruction is
xchgq %rbx, %rbx
This exchanges the contents of the operands, and normally, two different registers would be listed, but this way, it's just a no-op when running not under Valgrind.
After the assembly come the operands, first the outputs, then the inputs. This is how you conect the variables in C the registers in assembly.
"=d" (_zzq_result)
is an input operand that says to overwrite (=
) the variable _zzq_result
with the contents of register rdx
(d
) after the assembly is executed"a" (&_zzq_args[0])
is an output operand that says to place the contents of &_zzq_args[0]
(the address of the start of the array) in register rax
(a
) before the assembly is executed"0" (_zzq_default)
means to place the _zzq_default
variable into the output operand at index 0 (0
) of the list of input operands, which is rdx
So, before the inline assembly is executed, we'll have this
&_zzq_args[0] -> rax
_zzq_default -> rdx
...and after the inline assembly is executed:
rdx -> _zzq_result
You can see how this would allow the code to work when not run under Valgrind; all the assembly instructions would be no-ops, and the default would simple go into and out of rdx
.
For the last line, we have two "clobbers"3:
cc
indicates that the flags register was modifiedmemory
indicates that the assembly reads or writes to other items other than the ones listed in the operands section. I'm not quite sure why this is needed here, since as far as I can tell only rdx
and rax
are used.Now that we understand everything that happens in the inline assembly, we can take a look at some of the code in the demo Valgrind tool to see the whole thing come together:
static Bool
This is the function that is called when the JITter comes upon a Client Request. The relevant parameters are:
arg
: points to the start of the parameter array (_zzq_args
) that we populated before the inline assemblyret
: points to where we need to set the return value (rdx
)After checking that we indeed are receiving a request code for something that falls under the purview of this specific Valgrind tool, we switch
on that request code to see which of the various types of requests we want to make.
For our demo purposes, Felix just made the code print out the parameters and set the first item in the array as the return value (which if you look all the way back at the KC_BORROW_MUT
macro, is the address of the _place
variable).
At this point, we have a good grasp on both the C and the inline assembly, and it's time for some Rust!
First, translating the KC_BORROW_MUT
macro is straightforward.
Why did I make this a macro and not a regular function? Because I want to directly get the address of the variable; I don't want the address of the function argument, if it were to be passed into a function.
Next, we handle the VALGRIND_DO_CLIENT_REQUEST_EXPR
macro:
I created the Data
struct so that I could set arg1
to be a raw pointer instead having to have the entire array be u64
, like it was in the C code.
Now, let's talk through the Rust inline assembly. The five instructions should be familiar, though they look a bit different.
rol rdi, 3
rol rdi, 13
rol rdi, 61
rol rdi, 51
xchg rbx, rbx
Here I opted to use Intel syntax, which these days seems to be the syntax of choice for many, especially for its improved readability. In fact, I came across this HN comment, regarding the choice of having Intel syntax be default for inline asembly in Rust:
Yes, thank you for the Intel syntax. It is by far the most frequent syntax you'll find anywhere somewhat recent (1990+) and it is by far the most straightforward to learn by being clean. I did learn both in the 90s, but I've always struggled with AT&T symbols and different offerings which to me was hard to switch to when learning from different material sources.
In Intel syntax, the operands are flipped. Also, you don't have to escape the constants and registers.
Next, we come to the operands section.
According to The Rust Reference, the inout
syntax looks like
inout(<reg>) <in expr> => <out expr>
You should be able to include an expression that is put into register reg
at the start, with the contents of the register then being put into out expr
at the end.
Unfortunately, I wasn't able to get this to work (probably PEBCAK).
inout("di") zzq_default => zzq_result
just did not seem to populate the register with zzq_default
at the start, so what I ended up doing (which I think might actually be cleaner) was to set zzq_result
to have the default value from the beginning, that way if the binary is run standalone, we already have zzq_result
set with the correct value to return.
The "in" operand was pretty straightforward, looking very similar to the C inline assembly.
Here's the Rust code where we use the macro:
let mut val: u8 = 1;
let x = kc_borrow_mut!; // x = &mut val;
All in all, what took me the longest, was figuring out what "0" (_zzq_default)
meant in the C inline assembly.
I also struggled with the question of why rdx
and rax
are used, as opposed to some other registers.
As far I understand, this is chosen mostly because of conventions, but there is nothing stopping you from using other ones.
The biggest difference between C inline assembly and Rust inline assembly was understanding the operands section.
Also, the cc
clobber didn't seem to be needed, as according to The Rust Reference, if the flags register is not modified, then you have to explicitly set the preserves_flags
option.
Lastly, I didn't quite figure out how to express the equivalent of the memory
clobber.
The Rust Reference does mention the clobber
keyword in multiple spots, but it seems that you have to specify which registers are being clobbered, contrary to what memory
expresses.
I hope this doesn't come back to bite us later, and in fact, I probably have to get to the bottom of memory
once we proceed further with our project.
If we can communicate with Valgrind by using explicit macros in the code, we should be able to integrate this approach into the compiler, and insert the assembly during compilation (based on a compiler flag), unlocking the possibility for exciting new dynamic analysis tools for Rust that are tightly integrated with the rustc itself.
There's more work that would need to be done to support this, mostly in the MIR layer of the compiler, but given that our Rust demo is a success, we're off to a good start. As Felix and I continue our exploration of this dynamic analysis space, there'll be more interesting topics to uncover, which I'll document on this blog :)
If you made it this far, you are probably itching to see the full code:
Thanks to Nicholas Nethercote for his help in deciphering some of the Valgrind code and pointing me in the right direction, Felix for writing the initial C code which formed the basis for this entire journey, as well as Jack and Dillon for reading drafts of this post.
See Niko's mention of dynamic verification
The faker's guide to reading (x86) assembly language served as a good refresher on assembly
See the GNU Extended Asm docs for a full explanation of clobbers