More progress porting wasmtime to Theseus

Apr. 12, 2022  ·  by Kevin Boos

The quest port continues

This post covers the highlights of our ongoing work to port wasmtime to Theseus; see the previous post(s) for more information. Interested folks can follow along with the ported wasmtime code here.

📢 Good news: wasmtime-runtime now builds on Theseus! 📢

Completed initial port of wasmtime-runtime!

Last time we left off having finished several key features to support the needs of the wasmtime-runtime crate:

  • Thread-Local Storage (TLS)
  • Creation and management of Unix-like memory-mapped areas via the region and libc crates
    • Plus additions to tlibc, the Theseus-specific implementation of basic libc functions
  • Reading/writing of object files via the object crate
  • Improvements to Theseus's page_allocator and theseus_cargo build tool to handle more complexity

The full list of required, target-agnostic dependencies is as follows:

[package]
name = "wasmtime-runtime"
...

[dependencies]
wasmtime-environ = { path = "../environ", version = "0.30.0" }  ## Ported to `no_std` previously
libc = { version = "0.2.82", default-features = false }         ## Ported to Theseus in prior post
region = "2.1.0"         ## Ported to Theseus in prior post
log = "0.4.8"            ## Supports `no_std`
memoffset = "0.6.0"      ## Supports `no_std`
indexmap = "1.0.2"       ## Supports `no_std`
thiserror = "1.0.4"      ## Use `thiserror_core2` instead
more-asserts = "0.2.1"   ## Ported to `no_std` in prior post
cfg-if = "1.0"           ## Supports `no_std`
backtrace = "0.3.61"     ## Ported to Theseus in this post
lazy_static = "1.3.0"    ## Supports `no_std`
rand = "0.8.3"           ## Offers `no_std`-compatible `SmallRng`
anyhow = "1.0.38"        ## Supports `no_std`, with code changes

Over the past several weeks, we have completed our modifications to wasmtime-runtime such that it now builds properly on Theseus! The key missing parts were:

Crate / Feature Summary Reason Needed for wasmtime-runtime
backtrace Cross-platform crate for capturing stack traces For capturing and analyzing stack traces to see if any WASM module functions exist on the call stack
std::path Module for manipulating and parsing file paths To refer to WASM module files, and for (de)serialization
resume_unwind() Continues a panic action (e.g., unwinding), but skips the registered panic hook To continue propagating a panic across a native code-WASM boundary
thiserror/anyhow Helper crates for convenient error handling To derive the Error trait and easily return error types
Signal handling Registering signal handlers, e.g., for SIGSEGV, SIGILL For catching OS-level exceptions that occur while executing native code compiled from WASM modules

Back(trace) to the Future

The wasmtime-runtime crate uses backtrace to capture a stack trace when a trap occurs, such as a fault during WASM execution or another systems-level problem like Out Of Memory (OOM). This trace is used to both:

  1. Traverse the call stack to see if any stack frames from WASM code exist, and
  2. Provide the user or caller of wasmtime with more context about a runtime failure.

Porting backtrace was relatively simple; the primary changes required were to publicly expose more details about Theseus's custom unwinder. Feel free to check out the full changeset here, summarized below:

  • Theseus's unwinder calculates the register values for each stack frame. We simply use those values to provide backtrace with:
    • The stack frame's instruction pointer (i.e., call site address)
    • The current stack pointer at that execution point
    • The starting address of its containing function
  • Theseus supports symbolication: resolving an address into a symbol
    • This uses Theseus's crate management metadata, a CrateNamespace, which contains a map of all public symbols in that namespace
    • The key function is get_section_containing_address()
      • This is similar to the addr2line tool, but it works with Theseus's dynamically loading and linked code structure

The one remaining feature that our port of backtrace lacks is connecting a resolved symbol to its location: file path, line number, and column number. This is conceptually easy to do but requires debug information to be parsed from an object file. Although Theseus does support parsing DWARF debug info, it isn't always available because debug info is typically stripped from object files to keep their size down. Fortunately, the backtrace crate treats this information as optional, and thus wasmtime doesn't require it to be available, so we can simply return None when asked for symbol location details.

The Path Forwards

Many wasmtime crates use std::path::{Path, PathBuf} to refer to WASM module files that are JIT-compiled and loaded into a wasmtime engine. Thus, we must implement a version of path types that are API-compatible with Rust's std::path types in order to minimize the number of changes to wasmtime itself. You can find the code for that here, which is primarily a quick & dirty copy of the code from std::path.

Theseus already offers its own Path type, which is similar but not identical to those in Rust std::path. All we need is a simple glue code layer between std::PathBuf and Theseus's path::Path.

One notable difference between std::path and theseus_path is that Theseus uses Rust String and str types natively, so there is no need for the equivalent types OsString and OsStr; these types become simple typedefs in Theseus:

pub type OsString = String;
pub type OsStr = str;

Relax and unwind

The resume_unwind() function is used in wasmtime to carry on with the unwinding procedure after it has been caught. This is currently only used in the runtime's trap handling logic, which essentially continues unwinding after a trap that stemmed from a Rust-level panic, i.e., one deemed irrelevant to handling in-WASM traps.

Here is Theseus's implementation of resume_unwind() with code that tests it. Although this is conceptually tricky, the implementation is quite straightforward — simply start unwinding from the current point without starting from the regular panic handler. You can test this in Theseus by invoking the test program unwind_test -c.

To actually use our new resume_unwind() function in wasmtime-runtime, we add the following very simple code block.

#[cfg(feature = "std")]
std::panic::resume_unwind(panic);
#[cfg(target_os = "theseus")]
theseus_catch_unwind::resume_unwind(
    KillReason::Panic(PanicInfoOwned::from_payload(panic))
);

The notable difference between Theseus's resume_unwind and Rust's std::panic::resume_unwind is that we allow unwinding to occur from both a language-level panic and beneath the language level after a CPU-level "machine" exception. Thus, Theseus's panic payloads expect a KillReason rather than an erased type Box<Any>, so we must handle that minor difference.

Oh, an Error Occurred? Anyway Anyhow...

In yet another tribute to D. Tolnay, wasmtime uses his superb anyhow and thiserror crates for convenient error handling across nearly every source file and function. While anyhow technically supports no_std environments like Theseus, it cannot accommodate the same API.

Thus, we must change every. single. usage. of anyhow in the whole wasmtime code base.

The majority of the changes simply require use to add this snippet to any Result type before returning or unwrapping it:

.map_err(anyhow::Error::msg)?  // convert the `Err` into `anyhow::Error`

because in no_std environments, anyhow::Error does not impl std::error::Error and thus cannot perform the implicit conversion; we must do it explicitly. This rough edge is currently unavoidable, as evidenced by anyhow's documentation.

As you can see, most of these changes are functionally unnecessary. The real reason for such tedium is that the Error trait is defined in Rust's std library and is thus unavailable for use in no_std environments that only use core or alloc.

Thankfully, supporting thiserror is a bit easier, thanks to the thiserror_core2 crate that ports it to emit a derivation of the core2::error::Error trait. We can simply use this as a drop-in replacement for thiserror because we already use core2::error::Error as a substitute for std::error::Error.

💤💭💤 We dream of the day when @Jane Lusby's excellent work on moving the Error trait into core is completed! Once that lands, we won't need to bother with all this pablum.

The Last Jedi Dependency: Signal Handling

The last remaining feature needed to finish porting wasmtime-runtime is signal handling. This is needed for wasmtime to be able to catch traps that occur when executing WASM code that was JIT-compiled into native code, among other purposes.

Theseus doesn't offer POSIX-like signals because they're unsafe and unnecessary in a safe-language OS, but it does implement handlers for CPU exceptions, e.g., page faults, general protection faults, etc. However, we previously did not allow third-party crates to register handlers (callbacks)

Registering "signal" (CPU exception) handlers is now supported in Theseus as of commit . The initial implementation is limited to the following four categories of "signals", which was loosely based on the set of signals that wasmtime-runtime cares about.

/// The possible "signals" that may occur due to CPU exceptions.
pub enum Exception {
    /// (SIGSEGV) Bad virtual address, unexpected page fault.
    InvalidAddress     = 0,
    /// (SIGILL) Invalid opcode, malformed instruction, etc.
    IllegalInstruction = 1,
    /// (SIGBUS) Bad memory alignment, non-existent physical address.
    BusError           = 2,
    /// (SIGFPE) Bad arithmetic operation, e.g., divide by zero.
    ArithmeticError    = 3,
}

Each task can register one handler function for each category of exceptions:

pub trait ExceptionHandler = FnOnce(&ExceptionContext) -> Result<(), ()>;

pub fn register_handler(
    exception: Exception,
    handler: Box<dyn ExceptionHandler>,
) -> Result<(), ()> {
    HANDLERS.with(|handlers| {
        let handler_slot = &handlers[exception as usize];
        if handler_slot.borrow().is_some() {
            return Err(());
        }
        *handler_slot.borrow_mut() = Some(handler);
        Ok(())
    })
}

and the Thread-Local Storage (TLS) areas are used to efficiently access the handlers registered for each given task:

thread_local!{
    /// The "signal" handlers registered for each task.
    static HANDLERS: [RefCell<Option<Box<dyn ExceptionHandler>>>; 4] = Default::default();
}

When an exception occurs, the CPU jumps synchronously to the kernel function specified in the interrupt descriptor table (IDT). We simply add a condition to each IDT exception function to check for and obtain the relevant registered exception handler (if one exists). The registered handler is then invoked with the following contextual information about said exception; the key information is the instruction_pointer.

/// Information that is passed to a registered [`ExceptionHandler`]
/// about an exception that occurred during execution.
pub struct ExceptionContext {
    pub instruction_pointer: VirtualAddress,
    pub stack_pointer: VirtualAddress,
    pub exception: Exception,
    pub error_code: Option<ErrorCode>,
}

This feature is used in the trap handling component of our port of wasmtime-runtime, which needs the above context to determine whether the exception occurred in a section of code that came from a compiled WASM module. Of course, the handlers must first be registered as part of the platform-specific init procedure shown here.

Onwards and Upwards

With that, our port of wasmtime-runtime is complete! We're nearly done with the "minimum viable port" of wasmtime functionality, as the only remaining crates are wasmtime-jit and the top-level wasmtime crate itself. Once those are complete, we will publish a longer-form post about the journey to get wasmtime running on Theseus.

Be on the lookout for such good news soon!

Miscellaneous Contributions



<< Previous Post Next Post >>