The most popular form by far are JSON-based HTTP APIs (all though GraphQL are giving them a run for
their money). Sometimes these are referred to as restful - because we collectively have an aversion
towards taking REST seriously.
This post isn’t about REST.
It’s about a project I’ve been working on for the last year to handle the lifecycle of JSON-based
APIs:
reproto is a number of things, but most importantly it’s an interface description language (IDL) in
which you can write specifications that describe the structure of JSON objects.
This IDL aims to be compact and descriptive.
A simple .reproto specification looks like this:
# File: src/cats.reproto
type Cat {
name: string;
}
This describes an object which has a single field name, like: {"name": "Charlie"}.
Using reproto, we can now generate bindings for this in various languages.
reproto tries to integrate with the target language using the best frameworks available1.
Dependencies
A system is something greater than the sum of its parts.
Say you want to write a service that communicate with with many other services, it’s typically
painful and error prone to copy things around by yourself.
To solve this reproto is not only a language specification, but also a package manager.
Provide reproto with a build manifest in reproto.toml like this:
And reproto will have downloaded and built io.reproto.toystore from the central repository.
Importing a manifest from somewhere else inside of a specification will automatically use the
repository:
use io.reproto.toystore "^1" as toystore;
type Shelf {
toys: [toystore::Toy];
}
Dealing with many different versions of a package is handled through clever namespacing.
This makes it possible to import and use multiple different versions of a specification at once:
use io.reproto.toystore "^1" as toystore1;
use io.reproto.toystore "^2" as toystore2;
type Shelf {
toys: [toystore1::Toy];
toys_v2: [toystore2::Toy];
}
Documentation
Good documentation is key to effectively using an API.
reproto comes with a built-in documentation tool in reproto doc, which will generate
documentation for you by reading rust-style documentation comments.
With package management comes the problems associated with breaking changes.
reproto insists on using semantic versioning, and will actively check that any version you try to
publish doesn’t violate it:
$ reproto publish
src/io/reproto/toystore.reproto:12:3-22:
12: category: Category;
^^^^^^^^^^^^^^^^^^^ - minor change violation: field changed to be required
io.reproto.toystore-1.0.0:12:3-23:
12: category?: Category;
^^^^^^^^^^^^^^^^^^^^ - from here
This is all based on a module named semck that operates on the AST-level.
Not everything is covered yet, but it’s rapidly getting there.
Finally
In contrast to something like purely an api specification language, reproto aims to be a complete
system to hold your hands during the entire lifecycle of service development.
My litmus test will be when I’ve produced a mostly generated client for Heroic, which is well on
its way.
It’s also written in Rust, a language where a lot of these ideas have been shamelessly stolen
from.
There is still a lot of work to be done!
If you are interested in the problem domain and have spare cycles, please join me on Gitter.
While writing my last post I had the need to compile and run some code under Windows.
Being a Linux fanbox, this situation wasn’t optimal. Enter Wine.
Wine is a fantastic system.
With an initial release 24 years ago, it’s grown to encompass incredible things like a full
implementation of DirectX 9, providing very compelling gaming performance for Windows-only games on
Linux.
It also behaves like Windows when you run Rust-based applications on it.
This post is a quick tip for how you can setup a flexible environment for compiling and testing
small Rust applications on Linux. That behave like they would on Windows.
Installation
Install Wine, with whatever your preferred method is.
I’ve been spending most of my spare time working on ReProto, and I’m at a point where I need to
support specifying a per-project build manifest.
In this manifest I want to give the user the ability to specify build paths.
The problem I faced is: How do you have a path specification that is portable?
The build manifest will be checked into git repositories.
It will shared in verbatim across platforms, and users would expect it to work without having to
convert any paths specified in it to their native representation.
This is very similar to how a build configuration is provided to cargo through Cargo.toml.
It would really suck if you’d have to convert all back-slashes to forward-slashes, just because
the original author of a library is working on Windows.
Rust has excellent serialization support in the form of serde.
The following is an example of how you can use serde to deserialize TOML whose structure is
determined by a struct.
We’ve deserialized a list of paths so our work seems like it’s mostly done.
In the next section I will describe some details around platform-specific behaviors in Rust, and
how they come back to bite us in this case.
Platform behaviors
Representing filesystem paths in a platform-neutral way is an interesting problem.
Rust has defined a platform-agnostic Path type which has system-specific behaviors implemented
in libstd.
For example, in Windows it deals with a prefix consisting of the drive letter (e.g. c:).
The effect for our manifest is that using PathBuf would permit our application to accept and
operate over paths specified in different ways.
The exact of which depends on which platform your application is built for.
This is no good for configuration files that you’d expect people to share across platforms.
One representation might be valid on one platform, but not on others.
foo\\bar is treated like a path component, because backslash (\) is not a directory separator
on Linux.
The implementation of Path on Linux reflects this.
This means that mutator functions in Rust will treat this as a component when determining things
like what the parent directory of a given path is:
Path by itself provides a portable API.
PathBuf::push and Path::join are ways to manipulate a path on a per-component basis.
The components themselves might have restrictions on which
character sets may be used, but at least the path separator can be abstracted away.
Another major difference is how filesystem roots are designated.
Windows, interestingly enough, have multiple roots - one for each drive.
Linux only has one: /.
With this in mind we can write portable code that only manipulates relative paths.
These works independently of which platform it is running on:
Notice that the relative foo/bar traversal is maintained.
The realization I had is that you can have a portable description if you can describe a path only
in terms of its components, without filesystem roots.
Neither c:\foo\bar\baz nor /foo/bar/baz are portable descriptions, foo/bar/bazis.
It simply states; please traverse foo, then bar, then baz, relative to some directory.
Combining this relative path with a native path allows it to be translated into a
platform-specific path.
This path can then be used for filesystem operations.
This is the premise behind a new crate I created named relative-path, which I will be covering
briefly next.
Relative paths and the real world
In the relative-path crate I’ve introduces two classes: RelativePath and RelativePathBuf.
These are analogous to the libstd classes Path and PathBuf.
A fairly significant chunk of code could be reimplemented based on these classes.
The differences from their libstd siblings are small, but significant:
The path separator is set to a fixed character (/), regardless of platform.
Relative paths cannot represent an absolute path in the filesystem, without first specifying
what they are relative to through to_path.
The second rule is important to either determine the actual relativeness of a Path, or which
filesystem root or drive it belongs to.
This permits using RelativePathBuf in cases where having a portable representation would
otherwise cause problems across platforms.
Like with build manifests checked into a git repository:
My hope is that you from now on folks won’t be relegated to storing stringly typed fields and
is forced to figureout the portabilitypuzzle for themselves.
Final notes
Character restrictions are still a problem.
At some point we might want to incorporate replacement procedures, or APIs that return Result
to flag for non-portable characters.
Using a well-defined path separator gets us pretty far regardless.
Thank you for reading this. And please give me feedback on relative-path if you have the time.
The patch intends to mitigate the unexpected death of threads, and mitigate the impact
that they have on your application.
To help illustrate illustrate this, here is an example project with a very
nasty thread eating up all memory:
Compile and run this application with -Xmx16m.
You should see something like the following:
The application is stuck, we are no longer seeing any main: OK messages.
No stack traces, nothing.
The reason is that out coordinator thread allocates memory for its message,
this means that it could be the target of an OutOfMemoryError when the
allocation fails because BadThread has locked up all available memory and is
refusing to die.
This state is when it gets interesting. ThreadPoolExecutor will, as per documentation, happily catch and
swallow any exception being thrown in one of its tasks.
It is explicitly left to the developer to handle this.
This leaves us with a dead coordinator thread at the other end of the Queue, and main
is left to its own devices forever. :(.
This patch overrides the afterExecute
method. A hook designed to allow for custom behavior after the completion of
tasks.
Run the project again, and you should see the following:
Errors
I want to emphasise that OutOfMemoryError is generally not an error that you
can safely recover from. There are no guarantees that the thread responsible
for eating up your memory is the target for this error. Even if that is the
case, this thread might become important at a later stage in its life.
In my opinion, the most reasonable thing to do is to give up.
An Error is a subclass of Throwable that indicates serious problems that a
reasonable application should not try to catch. Most such errors are abnormal
conditions.
At this stage you might be tempted to attempt a clean shutdown of your
application on errors.
This might work. But we might also be in a state where a thread critical
towards the clean shutdown of your application is no longer alive.
There might not be any memory left to support a complex shutdown. Attempting it
could lead to your cleanup attempt crashing leading us back to where we
started.
If you want to cover manually created threads, you can make use of
Thread#setDefaultUncaughtExceptionHandler.
Just remember, this still does not cover thread pools.
On a final note, if you are a library developer: Please don’t hide your thread
pools from us.
In this post, about semantic versioning,
and how I believe it can be efficiently applied for the benefit of long-term
interoperability of Java libraries.
Let us introduce the basic premise of semantic versioning (borrowed from their
page), namely version numbers and the connection they have to the continued
development of your software.
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner,
and
PATCH version when you make backwards-compatible bug fixes.
Hello Java
Java has a lot of things which could qualify as members your public API.
The most distinct feature in the language is the interface, a fully abstract
class definition that forces you to describe all possible interactions that are
allowed with a given implementation.
So let’s build an API using that.
Consider @since, here it doesn’t contain the patch version. It could, but it
wouldn’t make a difference.
A patch must never modify API, that privilege is left to the major, and
the minor version.
Maven plays an important role here as well.
The Java ecosystem relies on it to distribute libraries and resolve
dependencies.
The way you would expose your library is by putting the above in an API
artifact named eu.toolchain.mylib:mylib-api.
You might also feel compelled to provide an implementation, this could be
eu.toolchain.mylib:mylib-core.
The separation is not critical, but it helps in being explicit in what your
public API is.
Both for you and your users.
My intent is to have your users primarily interact with your library through
interfaces, abstract classes, and
value objects.
A Minor Change
Let us introduce a minor change to the library.
In library terms, we are exposing another symbol.
For Java, this is just another method with a given signature added to the
already existing MyLibrary interface.
This only constitutes a minor change because consumers of the API which
happen to use 1.0 will happily continue to operate in a runtime containing
1.1.
Anything linked against 1.0 will be oblivious to the fact that there is
added functionality in 1.1.
This is due to indirection that is introduced by Java, method calls use a
very flexible symbolic reference to indicate the target of the invocation.
Removing a method and not fixing all callers of it would eventually cause
NoSuchMethodError.
Eventually, because it would not be triggered until a caller attempts the
invocation at runtime. Ouch.
What qualifies as a minor change
Identifying what qualifies as a minor change, and what does not, is one of the
harder aspects we need to deal with.
It requires a bit of knowledge in how binary compatibility works.
I’ll touch on a few things that are compatible, and why.
Increasing visibility
Increasing the visibility of a method is a minor change.
Visibility goes with the following modifiers, from least to most visible:
private
package protected (no modifier)
protected
public
From the perspective of the user, a think is not part of your public API if it
is not visible.
Adding a method
This works, because method invocations only consult the signature of the
method being called, which is handled indirectly by the virtual machine who
is responsible for looking up the method at runtime.
So this is good unless the client implements the given API.
If you are exposing an API that the client should implement, a very popular
compromise is to provide an abstract class that the client must use as
a base to maintain compatibility.
You as a library maintainer must maintain this class to make sure that between
each minor release it does not force clients to have to implement methods they
previously did were not required to.
This one is tricky, but probably the most important to understand.
If you have a documented behavior in your API, you are not allowed to remove
or modify it.
In practice, it means that once your javadoc asserts something, that assertion
must be versioned as well.
You may extend it in a manner, which does not violate the existing assertions.
You may not however, change the behavior from current Galaxy to Milky Way.
Your users will have operated under the assumption that the current galaxy
will be consumed.
Imagine their surprise when they run the newly upgraded application in the
Andromeda Galaxy and they inadvertently expedite their own extinction because
they didn’t expect a breaking change in behavior for a minor version :/.
A Major Change
Ok, so it’s time to rethink your library’s existence.
The world changed, you’ve grown and realized the errors of your way.
It’s time to fix all the design errors you made in the previous version.
In order to introduce a new major version, it is important to consider the
following:
Do I need to publish a new package?
Do I need to publish a new Maven artifact?
Should I introduce the changes using @Deprecated?
This sounds rough, but there are a few points to all this.
Publishing a new package
To maintain binary compatibility with the previous Major version.
There are no easy take-backs once an API has been published.
You may communicate to your clients that something is deprecated, and it is
time to upgrade.
You cannot force an atomic upgrade.
If you introduce a Major change that cannot co-exist in a single classpath.
Your users are in for
a world of pain.
Publishing a new Maven artifact
To allow your users to co-depend on the various major versions of your
library.
Maven will only allow one version of a <groupId>:<artifactId> combination to
exist within a given build solution.
For our example, we could go from eu.toolchain.mylib:mylib-api to
eu.toolchain.mylib:mylib2-api.
If you don’t change the artifact, Maven will not allow a user to install all
your major versions.
More importantly, any transitive dependencies requiring another major version
will find themselves lacking.
Using @Deprecated to your advantage
@Deprecated
is a standard annotation discouraging the use of the element that is annotated.
This has wide support among IDEs, and will typically show up as a warning when
used.
You can use this to your advantage when releasing a new Major version.
Assume that you are renaming a the following #badName() method.
Into #goodName().
You can go back and release a new minor version of your 1.x branch
containing the newly named method with a @Deprecated annotation.
This is an excellent way of communicating what changes your users can expect,
and can be applied to many situations.
In doing things the hard way we have
Lucene Core, and their
take on compatibility.
Parts of their
library
use versioned packages in order to allow different implementations to
co-exist.
Most compatibility issues are handled by rarely breaking the public API, and
doing
version
detection
at runtime
to determine which behavior to implement.
Guava maintains
compatibility for a long
time, and communicate expectations through their
@Beta annotation.
Unfortunately, there are
many things using @Beta
at the moment, making this a real consideration when using the library.
Project jigsaw
is an initiative that could improve things in the near future by implementing a
module system where dependencies and versions are more explicit.