Progress porting wasmtime-runtime to Theseus

Feb. 3, 2022  ·  by Kevin Boos

Porting wasmtime-runtime, the key wasmtime crate

A previous post from late 2021 chronicled our ongoing journey to port wasmtime to Theseus. While our bottom-up approach got off to a strong start, we quickly encountered our first major challenge when examining the wasmtime-runtime crate, as it contains many dependencies on platform-specific and legacy system interfaces:

  • Unix-like memory mapping and protection
  • Signal/trap handling
  • Thread-local storage
  • Stack introspection and backtracing
  • File and I/O abstractions
  • Exception (panic) handling and unwinding resumption

This post describes our progress over a few weeks of working to add these features to Theseus in order to support wasmtime-runtime's many complex dependencies.

Porting & reorganizing third-party libraries

We first re-organized Theseus's repository to include two folders for third-party crates:

  • libs/: contains standalone crates that don't depend on Theseus, but can be used by Theseus and others.
  • ports/: contains crates that have been ported to depend directly on Theseus-specific crates, e.g., those in kernel/ and are thus not standalone.

The main features ported over the past couple of months (early winter 2021-2022) are shown in the table below:

Crate / Feature Summary Reason Needed for wasmtime-runtime
libc Rust wrapper around the actual platform-specific libc implementation Used to establish memory mappings and register signal handlers
region Cross-platform APIs for virtual memory functions Used to allocate large chunks of memory and remap/protect memory regions as exec/read/write as needed
TLS Thread-Local Storage areas Used for thread_local!() macro, which is needed to handle traps and stack unwinding upon exceptions that occur while executing native code that was JIT-compiled from a WASM binary
object Helper crate for reading/writing object files, e.g., ELF Used to read object files generated by cranelift's backend and to manage unwind info

Porting libc to Theseus

Support for a minimal subset of libc functionality has been an ongoing but low-priority effort, mostly for two reasons:

  1. Running C code directly on Theseus is inherently unsafe, as C is not a safe language and can thus violate Theseus's language safety-based isolation and resource usage guarantees.
  2. No crates that Theseus depends on have needed an underlying libc, thus Theseus as a platform did not need to offer one... until now.

Theseus's libc implementation is called tlibc. which is described in this book chapter. So far it has been loosely based on the Redox project's relibc.

Our efforts of late were focused on supporting mmap for POSIX-style memory mappings, which Theseus has traditionally eschewed because they are unsafe and poorly-designed from a state management perspective1. In the future, we also may support POSIX-style signal handlers, but for now we have chosen to re-implement wasmtime's signal handling directly in safe Rust atop Theseus rather than going through an unsafe libc FFI boundary for no good reason.

The bulk of the mmap implemenation for tlibc was added in commit fffda85. The key aspects of this are:

  • tlibc exposes a POSIX-style mmap() function that calls Theseus APIs to instantiate new MappedPages objects, and then saves them in a private list so that they aren't dropped until the corresponding munmap() call is invoked.
    • This is required because Theseus's abstraction of a virtual memory mapping, MappedPages, is auto-unmapped upon drop to guarantee safety.
  • Currently, mlock and munlock are dummy functions because Theseus doesn't perform any swapping or paging to disk.
  • Memory protection (mprotect) is offered, but is currently limited because Theseus forces all current memory mappings to be marked as "present" in the page table.
    • Thus, stripping read permissions from a mapping technically works, but it violates the guarantees of the MappedPages type, i.e., the mapping is present and valid for the entire lifetime of a MappedPages object.

While we were at it, we went even further with additional improvements to theseus_cargo, libc, and tlibc to facilitate integration of Rust and C code atop Theseus.

In summary, we fixed all the issues with tlibc, libc, and theseus_cargo.

Now, Rust and C code (both in-tree and out-of-tree components) can all be compiled and loaded/linked together in Theseus.

Porting region to Theseus

With tlibc now supporting basic libc memory mapping functions, porting the region crate was fairly straightforward.

However, importantly, we chose to not force region on Theseus to depend on tlibc, mainly because it would introduce another layer of unsafety. The primary implementation of alloc and free are here, which are similar to the mmap() implemenation in tlibc. We also must express the region::Protection type in terms of Theseus's page table EntryFlags, which was generally straightforward.

The one tricky part of region that we disliked is QueryIter, which allows the caller to query all virtual memory areas across the entire current virtual address space to find ones that span or overlap with a certain range of addresses.
This is problematic for a few reasons:

  • Theseus's state management philosophy dictates that it does not maintain a centralized list of all memory mappings, so there's nothing to iterate over by default.
  • Theseus provides a very safe and clear API for interacting with memory mappings, which region::query_range() completely ignores because it assumes a POSIX-style virtual memory API.
  • Theseus strives to prevent TOCTTOU attacks by avoiding the concept of a handles that point to a resource indirectly. By design (and by necessity atop conventional OSes), QueryIter separates the "time of check" from the "time of use", leading to potentially confusing behavior and errors in which a memory region returned from a query no longer exists by the time one attempts to use it.

In the end, our solution was to allow QueryIter to expose and return only references to the memory areas already created by the region crate itself. This strives to mitigate safety issues that could arise by exposing all memory regions maintained by Theseus to higher-level Rust code that may use them unsafely through the region APIs. Hopefully this feature restriction doesn't pose a problem in the future.

Supporting Thread-Local Storage on Theseus

Thread-Local Storage (TLS) allows one to declare a variable that will be instantiated on a per-thread basis, with each thread having its own local copy that other threads cannot access. This is useful for many reasons, e.g., programming conveniencce, performant access to thread-specific data without locking, etc. Our motivation for finally supporting it in its ultimate flexible form -- the ELF standard TLS areas -- stemmed from wasmtime-runtime, which uses it in myriad ways.

Note: previously, Theseus offered a cheap imitation of TLS using the GS register to store limited, targeted data about each task, but it wasn't usable by any applications, libraries, or even other non-task kernel crates.

We implemented TLS support across several commits. This was a suprisingly complex and tricky implementation that required a lot of trial-and-error experimentation to determine how to correctly layout each TLS object in the per-task TLS area.

Another complicating factor is that Theseus loads and links all crates at runtime, which means that our implementation must support both statically-linked TLS areas from the base kernel image as well as newcomers found in dynamically-loaded crates. There are a lot of tradeoffs herein as it relates to reserving and allocating offset ranges in the TLS space for TLS data sections, tracking TLS data sections per namespace, per crate, etc -- but these are best saved for a separate post about TLS.

We went a step further by implementing Rust's thread_local!() macro for any Theseus crate, which offers lazy dynamic initialization and cleanup of TLS areas. This overcomes the limitations of standard ELF TLS sections, which behave like static globals in Rust: they are const-initialized and never dropped.

Porting the object crate to Theseus

The object crate is standalone and doesn't need to be ported to Theseus specifically, thus we can simply port it to no_std and place it in Theseus's libs/ directory. The only real difficulty here is that while object does support no_std, no previous users of object needed to write to an object file in a no_std environment. Thinking about it, we do agree that's kind of weird, but Theseus is just like that sometimes. 😊

Once we convinced the maintainers of object that this feature was necessary, the changes required to do so weren't very involved. It boiled down to a rearrangement of object's Cargo features and configuration blocks: check out the PR we submitted (that was accepted) for more details.

Miscellaneous Improvements

  • We improved the page allocator to allow it to lazily merge contiguous freed chunks of pages.
    • This happens lazily after an allocation request first fails; it is possible to also do it proactively in AllocatedPages::drop(), but that makes deallocation more expensive.
    • Needed for loading C object files or static libraries with entry points at a fixed address, e.g., the default entry point of 0x400000.
    • Future work: support building and loading position-independent executables (PIE, and PIC). This is required to simultaneously load multiple C executables at the same fixed address, because Theseus only offers a single virtual address space.
  1. See our OSDI 2020 paper for an in-depth discussion of this.



<< Previous Post Next Post >>