Portability concerns with Path
I’ve been spending most of my spare time working on ReProto, and I’m at a point where I need to support specifying a per-project build manifest.
In this manifest I want to give the user the ability to specify build paths. The problem I faced is: How do you have a path specification that is portable?
The build manifest will be checked into git repositories.
It will shared in verbatim across platforms, and users would expect it to work without having to
convert any paths specified in it to their native representation.
This is very similar to how a build configuration is provided to cargo
through Cargo.toml
.
It would really suck if you’d have to convert all back-slashes to forward-slashes, just because
the original author of a library is working on Windows.
Rust has excellent serialization support in the form of serde.
The following is an example of how you can use serde to deserialize TOML whose structure is
determined by a struct
.
extern crate toml;
#[macro_use]
extern crate serde_derive;
use std::path::PathBuf;
#[derive(Debug, Deserialize)]
pub struct Manifest {
paths: Vec<PathBuf>,
}
const FILE: &'static str = "paths = ['extra', 'src/main/reproto']";
pub fn main() {
let manifest: Manifest = toml::from_str(FILE).unwrap();
println!("{:?}", manifest);
}
We’ve deserialized a list of paths
so our work seems like it’s mostly done.
In the next section I will describe some details around platform-specific behaviors in Rust, and how they come back to bite us in this case.
Platform behaviors
Representing filesystem paths in a platform-neutral way is an interesting problem.
Rust has defined a platform-agnostic Path
type which has system-specific behaviors implemented
in libstd.
For example, in Windows it deals with a prefix consisting of the drive letter (e.g. c:
).
The effect for our manifest is that using PathBuf
would permit our application to accept and
operate over paths specified in different ways.
The exact of which depends on which platform your application is built for.
This is no good for configuration files that you’d expect people to share across platforms. One representation might be valid on one platform, but not on others.
The following snippet exemplifies the problem:
extern crate toml;
#[macro_use]
extern crate serde_derive;
use std::path::{PathBuf, Path};
#[derive(Debug, Deserialize)]
pub struct Manifest {
paths: Vec<PathBuf>,
}
const FILE: &'static str = "paths = ['foo\\bar']";
pub fn main() {
let manifest: Manifest = toml::from_str(FILE).unwrap();
if let Some(path) = manifest.paths.iter().next() {
let p = Path::new(".").join(path).join("baz");
println!("path = {:?}", p);
println!("components = {:?}", p.components().collect::<Vec<_>>());
}
}
On Windows, it would give this output:
path = "./foo\\bar/baz"
components = [CurDir, Normal("foo"), Normal("bar"), Normal("baz")]
While on Linux, it would behave differently with:
path = "./foo\\bar/baz"
components = [CurDir, Normal("foo\\bar"), Normal("baz")]
foo\\bar
is treated like a path component, because backslash (\
) is not a directory separator
on Linux.
The implementation of Path
on Linux reflects this.
This means that mutator functions in Rust will treat this as a component when determining things like what the parent directory of a given path is:
use std::path::Path;
pub fn main() {
let path = Path::new("root").join("foo\\bar");
let parent = path.parent();
println!("parent = {:?}", parent);
}
On Windows:
parent = Some("root\foo")
On Linux:
parent = Some("root")
Portable paths
Path
by itself provides a portable API.
PathBuf::push
and Path::join
are ways to manipulate a path on a per-component basis.
The components themselves might have restrictions on which
character sets may be used, but at least the path separator can be abstracted away.
Another major difference is how filesystem roots are designated.
Windows, interestingly enough, have multiple roots - one for each drive.
Linux only has one: /
.
With this in mind we can write portable code that only manipulates relative paths. These works independently of which platform it is running on:
use std::path::Path;
use std::env;
fn main() {
let base = env::current_dir().unwrap();
let target = base.join("foo").join("bar");
println!("target = {:?}", target);
}
On Windows this gives:
target = "C:\\Users\\udoprog\\foo\\bar"
And on Linux:
target = "/home/udoprog/foo/bar"
Notice that the relative foo/bar
traversal is maintained.
The realization I had is that you can have a portable description if you can describe a path only in terms of its components, without filesystem roots.
Neither c:\foo\bar\baz
nor /foo/bar/baz
are portable descriptions, foo/bar/baz
is.
It simply states; please traverse foo
, then bar
, then baz
, relative to some directory.
Combining this relative path with a native path allows it to be translated into a platform-specific path. This path can then be used for filesystem operations.
This is the premise behind a new crate I created named relative-path
, which I will be covering
briefly next.
Relative paths and the real world
In the relative-path
crate I’ve introduces two classes: RelativePath
and RelativePathBuf
.
These are analogous to the libstd classes Path
and PathBuf
.
A fairly significant chunk of code could be reimplemented based on these classes.
The differences from their libstd siblings are small, but significant:
- The path separator is set to a fixed character (
/
), regardless of platform. - Relative paths cannot represent an absolute path in the filesystem, without first specifying
what they are relative to through
to_path
.
The second rule is important to either determine the actual relativeness of a Path, or which filesystem root or drive it belongs to.
This permits using RelativePathBuf
in cases where having a portable representation would
otherwise cause problems across platforms.
Like with build manifests checked into a git repository:
extern crate toml;
#[macro_use]
extern crate serde_derive;
extern crate relative_path;
use relative_path::RelativePathBuf;
use std::path::{PathBuf, Path};
#[derive(Debug, Deserialize)]
pub struct Manifest {
paths: Vec<RelativePathBuf>,
}
const FILE: &'static str = "paths = ['foo/bar']";
pub fn main() {
let manifest: Manifest = toml::from_str(FILE).unwrap();
if let Some(path) = manifest.paths.iter().next() {
let p = path.to_path(Path::new(".")).join("baz");
println!("path = {:?}", p);
println!("components = {:?}", p.components().collect::<Vec<_>>());
}
}
My hope is that you from now on folks won’t be relegated to storing stringly typed fields and is forced to figure out the portability puzzle for themselves.
Final notes
Character restrictions are still a problem.
At some point we might want to incorporate replacement procedures, or APIs that return Result
to flag for non-portable characters.
Using a well-defined path separator gets us pretty far regardless.
Thank you for reading this. And please give me feedback on relative-path
if you have the time.