Porting Rust to WebAssembly
I recently spent some effort trying to make reproto
run in a browser.
Here I want to outline the problems I encountered and how I worked around them.
I will also provide a number of suggestions for how things might be improved for future porters.
A big chunk of the wasm
story on Rust currently relies on stdweb
.
Needless to say, this project is incredible.
stdweb
makes it smooth to build Rust applications that integrates with a browser.
There’s a ton of things that can be said about that project, but first I want to focus on the porting efforts of reproto
.
Getting up to speed and shaking the tree
The best way to port a library is to compile it.
For Rust it’s as simple as installing cargo-web
and setting up a project with all your dependencies.
All it really needs is a Cargo.toml
declaring your dependencies, and a src/main.rs
with extern
declarations for everything you want compiled.
For reproto
, that process looks like this:
# Cargo.toml
[package]
# skipped
[dependencies]
reproto-core = {path = "../lib/core", version = "0.3"}
reproto-compile = {path = "../lib/compile", version = "0.3"}
reproto-derive = {path = "../lib/derive", version = "0.3"}
reproto-manifest = {path = "../lib/manifest", version = "0.3"}
reproto-parser = {path = "../lib/parser", version = "0.3"}
reproto-backend-java = {path = "../lib/backend-java", version = "0.3"}
reproto-backend-js = {path = "../lib/backend-js", version = "0.3"}
reproto-backend-json = {path = "../lib/backend-json", version = "0.3"}
reproto-backend-python = {path = "../lib/backend-python", version = "0.3"}
reproto-backend-rust = {path = "../lib/backend-rust", version = "0.3"}
reproto-backend-reproto = {path = "../lib/backend-reproto", version = "0.3"}
stdweb = "0.3"
serde = "1"
serde_json = "1"
serde_derive = "1"
//! src/main.rs
extern crate serde;
#[macro_use]
extern crate serde_derive;
extern crate serde_json;
#[macro_use]
extern crate stdweb;
extern crate reproto_backend_java as java;
extern crate reproto_backend_js as js;
extern crate reproto_backend_json as json;
extern crate reproto_backend_python as python;
extern crate reproto_backend_reproto as reproto;
extern crate reproto_backend_rust as rust;
extern crate reproto_compile as compile;
extern crate reproto_core as core;
extern crate reproto_derive as derive;
extern crate reproto_manifest as manifest;
extern crate reproto_parser as parser;
fn main() {
stdweb::initialize();
}
Finally we want to add a Web.toml
, which will allow us to specify the default target so we won’t
have to type it out all the time:
# Web.toml
default-target = "wasm32-unknown-unknown"
Now the project should build by running cargo web build
.
For tracing down where dependencies come from, I relied heavily on cargo-tree
.
When you do encounter a problem cargo tree
can quickly determine how a given package was pulled
in:
cargo tree
[dependencies]
├── reproto-backend-java v0.3.13 (file://<home>/repo/reproto/lib/backend-java)
│ [dependencies]
│ ├── genco v0.2.6 (*)
│ ├── log v0.3.9
│ │ [dependencies]
│ │ └── log v0.4.1
│ │ [dependencies]
│ │ └── cfg-if v0.1.2
... snip
Big numbers
My project is structured into many different modules, each loosely responsible for one aspect of the solution.
The first module where I encountered problems was core
.
The num
crate by default pulls in rustc-serialize
, which fails like this:
|
853 | fn encode<S: Encoder>(&self, s: &mut S) -> Result<(), S::Error>;
| ---------------------------------------------------------------- `encode` from trait
...
1358 | impl Encodable for path::Path {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing `encode` in implementation
error[E0046]: not all trait items implemented, missing: `decode`
--> <registry>/src/github.com-1ecc6299db9ec823/rustc-serialize-0.3.24/src/serialize.rs:1382:1
|
904 | fn decode<D: Decoder>(d: &mut D) -> Result<Self, D::Error>;
| ----------------------------------------------------------- `decode` from trait
...
1382 | impl Decodable for path::PathBuf {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing `decode` in implementation
There appears to be a trait implementation missing.
Opening up the specified line (1382
) reveals that rust-serialize has platform-specific
serialization for PathBuf
:
impl Decodable for path::PathBuf {
#[cfg(target_os = "redox")]
fn decode<D: Decoder>(d: &mut D) -> Result<path::PathBuf, D::Error> {
// ...
}
#[cfg(unix)]
fn decode<D: Decoder>(d: &mut D) -> Result<path::PathBuf, D::Error> {
// ...
}
#[cfg(windows)]
fn decode<D: Decoder>(d: &mut D) -> Result<path::PathBuf, D::Error> {
// ...
}
}
Interestingly enough, this is something I’ve touched on in a previous post as a portability concern.
In rustc-serialize
, paths are not portable because not all platforms have serializers defined for them!
impl
s being missing for a specific platform wouldn’t be a big deal if platform-specific bits were better tucked away.
As it stands here, we end up with an incomplete impl Decodable
which is a compiler error.
If the entire impl
block was conditional it would simply be overlooked as another missing
implementation. In practice this would mean that you wouldn’t be able to serialize PathBuf
easily, but this can be worked around be simply not using it.
Due to the current state of affairs, the easiest way to deal with it was simply to disable the default features for the num-*
crates.
A bit tedious to add everywhere, but not a big deal:
[package]
# ...
[dependencies]
num-bigint = {version = "0.1", default_features = false}
num-traits = {version = "0.1", default_features = false}
num-integer = {version = "0.1", default_features = false}
There is no filesystem
I lie, there kind of is. Or at least there can be.
But for our current target wasm32-unknown-unknown
there isn’t.
This means that all of your sweet code using std::fs
simply won’t work.
//! src/main.rs
#[macro_use]
extern crate stdweb;
use std::fs;
fn test() -> String {
fs::File::create("hello.txt").expect("bad file");
"Hello".to_string()
}
fn main() {
stdweb::initialize();
js! {
Module.exports.test = @{test};
}
}
Taking this for a spin, would result in an unfortunate runtime error:
To work around this I introduced a layer of indirection.
My very own fs.rs
.
I then ported all code to use this so that I can swap out the implementation at runtime.
This wasn’t particularly hard, seeing as I already pass around a Context
to collect errors.
Now it just needed to learn a new trick.
Finally I ported all code that used Path
to use relative-path
instead.
This guarantees that I won’t be tempted to hit any of those platform-specific APIs like canonicalize
, which requires access to a filesystem.
With this in place I can now capture the files written to my filesystem abstraction directly into memory!
Ruining Christmas with native libraries
Anything involving native libraries will ruin your day in one way or another.
My repository
component uses ring
for calculating Sha256 checksums.
The first realization is that repositories won’t work the same - if at all - on the web.
We don’t have a filesystem!
At some point it might be possible to plug in a solution that communicates with a service to fetch dependencies.
But that is currently not the goal.
This realization made the solution obvious: web users don’t need a repository.
I moved the necessary trait (Resolver
) from repository
to core
, and provided a no-op implementation for it.
The result is that I no longer depend on the repository
crate to have a working system, sidestepping the native dependency entirely in the browser.
Neat!
Revenge of the Path
I thought I had seen the last of Path
. But url
decided to blow up in my face like this:
error[E0425]: cannot find function `path_to_file_url_segments` in this scope
--> <registry>/src/github.com-1ecc6299db9ec823/url-1.6.0/src/lib.rs:1934:32
|
1934 | let (host_end, host) = path_to_file_url_segments(path.as_ref(), &mut serialization)?;
| ^^^^^^^^^^^^^^^^^^^^^^^^^ did you mean `path_to_file_url_segments_windows`?
error[E0425]: cannot find function `file_url_segments_to_pathbuf` in this scope
--> <registry>/src/github.com-1ecc6299db9ec823/url-1.6.0/src/lib.rs:2057:20
|
2057 | return file_url_segments_to_pathbuf(host, segments);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ did you mean `file_url_segments_to_pathbuf_windows`?
error: aborting due to 2 previous errors
error: Could not compile `url`.
Yet again. Paths aren’t portable.
The url
crate wants to translate file segments into real paths for you.
It does this by hiding implementations of file_url_segments_to_pathbuf
behind platform-specific gates.
Obviously there there are no wasm
implementations for this.
An alternative here is to use something like hyper::Uri
, but that would currently mean pulling in all of hyper and its problematic dependencies.
I settled for just adding more indirection and isolating the components that needed HTTP and URLs into their own modules.
"All problems in computer science can be solved by another level of indirection" — David Wheeler
What’s the time again?
chrono
is another amazing library, I used it in my derive
component to detect when a string
looks like a datetime
.
Unfortunate for me, chrono
depends on time
.
Another platform-specific dependency!
error[E0432]: unresolved import `self::inner`
--> <registry>/src/github.com-1ecc6299db9ec823/time-0.1.39/src/sys.rs:3:15
|
3 | pub use self::inner::*;
| ^^^^^ Could not find `inner` in `self`
error[E0433]: failed to resolve. Could not find `SteadyTime` in `sys`
--> <registry>/src/github.com-1ecc6299db9ec823/time-0.1.39/src/lib.rs:247:25
|
/// snip...
error: aborting due to 10 previous errors
error: Could not compile `time`.
Because the derive feature was such a central component in what I wanted to port I started looking for alternatives instead of isolating it.
My first attempt was the iso8601
crate, a project using nom
to parse ISO 8601 timestamps.
Perfect!
error[E0432]: unresolved import `libc::c_void`
--> <registry>/src/github.com-1ecc6299db9ec823/memchr-1.0.2/src/lib.rs:19:5
|
19 | use libc::c_void;
| ^^^^^^^^^^^^ no `c_void` in the root
error[E0432]: unresolved import `libc::c_int`
--> <registry>/src/github.com-1ecc6299db9ec823/memchr-1.0.2/src/lib.rs:21:12
|
/// ...
error: build failed
On no…
Ok, it’s time to pull out cargo-tree
.
$ cargo tree
# ...
├── iso8601 v0.2.0
│ [dependencies]
│ └── nom v3.2.1
│ [dependencies]
│ └── memchr v1.0.2
│ [dependencies]
│ └── libc v0.2.36
# ...
So nom
depends on memchr
, an interface to the memchr
libc function.
That makes sense. nom
wants to scan for characters as quickly as possible.
Unfortunately that makes nom
and everything depending on it unusable in wasm
right now.
The easiest route ended up being to write my own function here.
Make things better
In the following sections I try to summarize how we can improve the experience for future porters.
Make platform detection a first class feature of Rust
If you look over the error messages encountered above, you can see that they have one thing in common: The are all unique.
This is unfortunate, since they all relate to the same problem: there is a component X
that is hidden behind a platform gate.
When that component is no longer provided, the project fails to compile.
Wouldn’t it be better if the compiler error looked like this:
error[EXXXX]: section unavailable for platform (target_arch = "wasm32", target_os = "unknown"):
--> <registry>/src/github.com-1ecc6299db9ec823/rustc-serialize-X.X.X/src/serialize.rs:148:1
|
148 | #[platform(target_arch, target_os)] {
| ^^^^^^^^^^^^^^^^^^^^^^ - platform defined here
| impl Decodable for path::PathBuf {
| #[cfg(target_os = "redox")]
| fn decode<D: Decoder>(d: &mut D) -> Result<path::PathBuf, D::Error> { .. }
|
| #[cfg(target_os = "unix")]
| fn decode<D: Decoder>(d: &mut D) -> Result<path::PathBuf, D::Error> { .. }
|
| #[cfg(target_os = "windows")]
| fn decode<D: Decoder>(d: &mut D) -> Result<path::PathBuf, D::Error> { .. }
| }
| }
}
It works by detecting when your platform has a configuration which does not match any existing gates, providing you with contextual information of why it failed. This means that the matching would either have to be exhaustive (e.g. provide a default fallback), or fail where the matching actually occurs.
This is much better than the arbitrary number of compiler errors caused by missing elements.
Transitive feature flags
This is the first exercise I’ve had in explicitly disabling default features. Sometimes it can be hard. Dependency topologies are complex, and mostly out of your hand.
This suggestion is heavily influenced by use
flags in Gentoo, and would be a controversial change.
The gist here is that I can enable a feature for a crate and all it’s dependencies. Not just directly on that crate. This way it doesn’t matter that a middle layer forgot to forward a feature flag, you can still disable it without too much hassle.
Different crates might use the same feature flag for different things. But it begs the question: are we more interested in patching all crates to forward feature flags correctly, than we are patching crates which use the wrong flags?
Conclusion
This was a lot of work. But much less than I initially expected.
The wasm
ecosystem in Rust is really on fire right now, and things are improving rapidly!
I’m actually really excited over how well it works. Apart from a small number of expected, and even smaller number of unexpected dependencies. Things just work.
In summary:
- Avoid native dependencies.
- When something breaks, it’s probably because of a gate.
- Abstract the filesystem.
- Avoid using
Path
/PathBuf
and APIs which have platform-specific behaviors. Considerrelative-path
as an alternative.
So to leave you, feel free to try out reproto
in your browser using the new eval
app:
UPDATE #1:
Here is the relevant PR in chrono
to make time
an optional dependency.
Please make your voice heard if this is something you want!
For nom
, memchr support for wasm
was merged in December, unfortunately that leaves existing libraries behind until they are upgraded.