We recently landed support for Windows in Krustlet. Although the final PR was relatively small, there was a lot of learning behind it. While adding this support, we came across an oddity that forced us to learn a whole bunch about stack vs heap allocation in Rust and figured it would be good to share with the world. As with most learning opportunities, this one starts with a story.
A stack overflow
We started off excited to finally get Krustlet building for Windows. We had built it before with
major modifications back with Krustlet 0.1, but now had all the building blocks in place to build it
normally (without an OpenSSL dependency to boot!). After adding some Windows specific build scripts
and files, a simple cargo build
worked without problems. Success!!!
Or maybe not…Then we tried to actually run the thing:
$ .\target\debug\krustlet-wascc.exe <a bunch of config flags>
<stack trace>
thread 'main' has overflowed its stack
Um…that isn’t supposed to happen. We haven’t even run anything yet. Definitely nothing that should cause an overflow. Can we at least get the help text?
$ .\target\debug\krustlet-wascc.exe -h
<stack trace>
thread 'main' has overflowed its stack
Ok, well something definitely seems Wrong™. What about the other binary we compile?
$ .\target\debug\krustlet-wasi.exe -h
<stack trace>
thread 'main' has overflowed its stack
We were completely perplexed. Why would we not get any output from the program before overflowing?
We tried compiling with the --release
flag and that got us a little more output before exiting
with the same error. We also started looking for some big structs, but didn’t see any in our code.
Getting somewhere
Someone else at the company suggested we trying bumping the stack size. Turns out it really isn’t that
difficult to do (please note that if you have a large project, setting RUSTFLAGS
will make
everything recompile):
$ export RUSTFLAGS = "-C link-args=-Wl,-zstack-size=<size in bytes>"
$ cargo build
In our case, this didn’t really help. We had to get to an eye-watering stack size of 4GB to get things working, so that wasn’t going to work. However, this could be useful to you in your project.
Someone else also suggested printing out the type sizes, which also turns out to be fairly simple.
This does require that you have the nightly
toolchain installed using rustup
:
$ export RUSTFLAGS = "-Zprint-type-sizes"
$ cargo +nightly build
<truncated>
print-type-size type: `std::result::Result<(), std::fmt::Error>`: 1 bytes, alignment: 1 bytes
print-type-size discriminant: 1 bytes
print-type-size variant `Ok`: 0 bytes
print-type-size field `.0`: 0 bytes
print-type-size variant `Err`: 0 bytes
print-type-size field `.0`: 0 bytes
print-type-size type: `std::sys::unix::process::process_common::ExitCode`: 1 bytes, alignment: 1 bytes
print-type-size field `.0`: 1 bytes
print-type-size type: `std::fmt::Error`: 0 bytes, alignment: 1 bytes
print-type-size type: `unwind::libunwind::_Unwind_Context`: 0 bytes, alignment: 1 bytes
Finished dev [unoptimized + debuginfo] target(s) in 1.62s
This output is pretty verbose, but useful. If you are looking for large structs, we’d recommend
filtering out any single byte fields using grep
or any other tool. After we filtered our input,
most of the structs we’d defined were pretty small until we found one anomaly. It turned out that
our config struct that loaded data from command line flags and environment variables was quite large
due to all of the different things it was checking. So we tried to put it in a
Box
so it would be on the heap instead of
the stack. This partially helped, but one of our binaries was still not working.
Although this was not helpful for us in the end, it is another tool in your toolbox for identifying
large data structures causing you an overflow. These large structures can then be pared down or you
can put it in a Box
so that it is put on the heap. An important detail to remember here is that
when you first create a struct, it will be on the stack (see the Rust
docs for a more
detailed overview), as Box::new
allocates space on the heap and then puts the struct there. So if you allocate a bunch of large
data structures (not in a Vec
, as it points to data on the
heap) before boxing, all of those
will be on the stack.
The solution
So after that whole rigmarole, we still hadn’t gotten anywhere. So after a weekend break, we went back and started looking through our code one more time. Krustlet uses the tokio runtime to run a whole bunch of control loops. Each of these control loops would spawn other tasks as needed to perform business logic.
Upon investigation, we noticed that we had used the futures
crate select!
macro. This allows you to
wait on multiple futures for the first one to return a response. But according to the docs:
If a similar async function is called outside of select to produce a Future, the Future must be pinned in order to be able to pass it to select.
We had to set up our futures outside of the select and this caused us to “pin” the futures using
pin_mut!
. What does that mean? Well, we went back to the docs, and sure enough, we found the
problem:
Pins a value on the stack
Oh…well that explains a lot. Tokio futures generate a lot of other code and scaffolding (out of
necessity) around your actual work function. So things that each future was doing also would end up
bloating stack use. To be clear, we could have also done a Box::pin
, but it turns out that tokio
also has a select!
macro that doesn’t
require us to pin (although it does do a little bit of pinning magic under the hood). So once we got
rid of the stack pinning, everything started working with no stack overflows!
So what did we learn here?
Ok, that was a lot of information, so what did we actually learn here?
If you are having problems with a stack overflow, remember to check these things:
- Do you need to increase the stack size?
- Do you have any large structs? Try printing the type sizes to check
- Are you pinning or allocating things on the stack you didn’t mean to?
Another lesson we learned from this experience is “with great power comes great responsibility” around zero cost abstractions. Rust focuses on having zero cost abstractions across its whole API surface. This is quite useful as you can feel free to use any of the abstractions without worry for adding additional overhead or introducing other behavior you didn’t expect. However, it also means you need to be careful and deliberate in your use of every abstraction, especially as a new Rust programmer.
In our case, the compiler complained and said “Hey! I need this future to be pinned,” so we pinned it. We didn’t take the time to understand what the abstraction was doing for us, which in this case was pinning to the stack. Rust gives us the ability to easily pin things to the stack or allocate them on heap (i.e. we don’t have to go manually allocate the memory ourselves) instead of magically managing the choice for us. But in this case, we weren’t aware of all the repercussions of our choices and it led us down a rabbit hole. So the lesson to remember here is to remember that “zero cost” doesn’t mean “zero responsibility.”
Hopefully this is helpful to you and helps you avoid possible pitfalls in your own applications.