Guillaume Desmottes
April 28, 2020
Reading time:
A common complaint heard about Rust is the size of the binary it produces. They are various reasons explaining why Rust binaries are generally bigger that ones produced with lower level languages such as C. The main one is Cargo, Rust's package manager and building tool, producing static binaries by default. While larger binaries are generally not much of an issue for desktop or server applications, it may become more of a problem on embedded systems where storage and/or memory may be very limited.
GStreamer is used extensively at Collabora to help our clients to build embedded multimedia solutions. With Rust gaining traction among the GStreamer community as an alternative to C to write GStreamer applications and plugins, we began wondering if the size of such Rust plugins would be a problem for embedded systems, and what could be done to reduce sizes as much as possible.
Inspired by this Tiny Rocket analysis and the Minimizing Rust Binary Size repository, here are the different strategies we tried to reduce the size of a minimal Rust GStreamer plugin.
All the builds have been done on Fedora 32 using latest stable Rust with the stable-x86_64-unknown-linux-gnu
toolchain.
$ rustc --version rustc 1.42.0 (b8cedc004 2020-03-09)
We built the gst-plugin-tutorial plugin from gst-plugins-rs for this experiment. In order to make this plugin as minimal as possible we removed all the elements except rsidentity
, a simpler version of the identity
element. We also removed the existing profile settings to build with the default cargo settings.
Let's start by looking at the size of the plugin after a normal build:
$ cargo build $ ls -l target/debug/libgstrstutorial.so -rwxrwxr-x. 2 cassidy cassidy 32248640 1 avril 12:09 target/debug/libgstrstutorial.so
So 31M
for a trivial plugin not doing much, that's quite a lot indeed! But this is a dev
build which is not meant to be used in production. Let's retry using a release
build:
$ cargo build --release $ ls -l target/release/libgstrstutorial.so -rwxrwxr-x. 2 cassidy cassidy 2740472 1 avril 12:14 target/release/libgstrstutorial.so
Switching from a dev
build to a release
one reduced the size by a factor 11!
Let's keep those metrics as reference:
build | modifications | size (bytes) | size (human) | % change |
---|---|---|---|---|
dev | none | 32248640 | 31M | 0% |
release | none | 2740472 | 2,7M | 0% |
Cargo does not strip binaries and there is currently no setting to do it. We could use cargo-strip but it's easier to just strip manually in such simple example:
$ strip target/debug/libgstrstutorial.so $ ls -l target/debug/libgstrstutorial.so -rwxrwxr-x. 2 cassidy cassidy 604512 1 avril 12:19 target/debug/libgstrstutorial.so $ strip target/release/libgstrstutorial.so $ ls -l target/release/libgstrstutorial.so -rwxrwxr-x. 2 cassidy cassidy 305504 1 avril 12:19 target/release/libgstrstutorial.so
As the plugin is statically built, the symbols information of all the crates (dependencies) used by the plugin ended up in our binary. Stripping it removed them and so saved us a lot of space.
build | modifications | size (bytes) | size (human) | % change |
---|---|---|---|---|
dev | none | 32248640 | 31M | 0% |
dev | stripped | 604512 | 591K | -98% |
release | none | 2740472 | 2,7M | 0% |
release | stripped | 305504 | 299K | -88% |
These numbers look much better, we already have something that should be usable in most systems. But we can still save some space by tweaking Cargo's build flags. All these settings are set using Cargo's profile sections.
From this point we'll consider only the size of release builds as that's what actually matter when distributing sofware in production. So we'll set our build flags in the profile.release
section of our Cargo manifest.
By using the LTO setting and reducing the number of compilation units we can request the compiler to generate smaller binaries at the cost of a higher compile time. Let's add those settings in the profile configuration, this is done by editing our Cargo.toml
and setting the lto
and codegen-units
settings in the release
profile:
[profile.release] lto = true codegen-units = 1
These changes reduced the plugin size quite a lot, but once stripped we notice that we actually gained only 44K.
build | modifications | size (bytes) | size (human) | % change |
---|---|---|---|---|
release | none | 2740472 | 2,7M | 0% |
release | lto | 888560 | 868K | -67.6% |
release | lto + stripped | 260368 | 255K | -90.5% |
Cargo proposes different optimization levels. Some are more fit for debugging while others are meant to achieve better performances. There is also the 'z'
level optimizing for size:
opt-level = "z"
We gained a few extra kilobytes:
build | modifications | size (bytes) | size (human) | % change |
---|---|---|---|---|
release | none | 2740472 | 2,7M | 0% |
release | lto | 888560 | 868K | -67.6% |
release | lto + opt-level | 855096 | 836K | -68.8% |
release | lto + opt-level + stripped | 231696 | 227K | -91.5% |
By default, Rust can provide a nice backtrace when panicking. This can be quite handy when debugging but consumes some space which may not be useful in production builds. Disabling backtraces on panic!
can save us the size of the unwinding code in our plugin.
panic = 'abort'
Disabling this feature saved us some extra bytes as well:
build | modifications | size (bytes) | size (human) | % change |
---|---|---|---|---|
release | none | 2740472 | 2,7M | 0% |
release | lto | 888560 | 868K | -67.6% |
release | lto + opt-level | 855096 | 836K | -68.8% |
release | lto + opt-level + panic abort | 792136 | 774K | -71.2% |
release | lto + opt-level + panic abort + stripped | 207024 | 203K | -92.4% |
It's important to note that this change will not only remove the panic stacktrace but also affect the behavior of GStreamer Rust plugins.
gstreamer-rs provides a macro converting panics to proper GStreamer error messages that can be handled by the application.
When such panic occurs the element will be marked as unusable but the application will continue running and have a chance to gracefully handle the problem.
By setting panic = 'abort'
this whole system is disabled and the application process will abort
right away.
At this point we used all the options usable with the stable Rust version. To reduce even further, we would have to switch to Rust nightly, the unstable version of the compiler. One interesting option would be to manually build libstd so it can benefits from our optimized build settings.
Extreme solutions such as not using libstd are not really an option here as glib-rs and gstreamer-rs are heavily using the Rust standard library.
So we managed to reduce the plugin size to 203K
which is a 92% improvement from the default release
build. Here are the final settings used:
[profile.release] lto = true codegen-units = 1 opt-level = "z" panic = 'abort'
We reached a size reasonable enough to be used in lots of embedded use cases. But how does it compare to a C implementation? We could have used the existing identity
element as a comparaison but it's bundled in the coreelements
plugin and provide more feature than rsidentity
.
For the sake of the experiment, we re-implemented rsidentity in C using the exact same feature and APIs. It weigths 48K
reduced to 15K
once stripped.
-rwxrwxr-x. 1 cassidy cassidy 15K 1 avril 15:50 libgstidentitylight.so
So the Rust size overhead seems to be around 190K
for this simple plugin. That's not unexpected as the Rust version statically link on Rust's standard library and contains all the bindings code between GLib and GStreamer.
We can use cargo bloat to list the biggest dependencies. Note that those numbers are for a pre-stripped build:
$ cargo bloat --release --crates 9.8% 58.2% 76.2KiB std 4.6% 27.2% 35.5KiB [Unknown] 1.4% 8.2% 10.7KiB gstreamer 0.5% 2.9% 3.8KiB glib 0.1% 0.8% 1.1KiB gstrstutorial 0.1% 0.7% 934B once_cell 0.0% 0.2% 282B gstreamer_sys 0.0% 0.0% 64B muldiv 0.0% 0.0% 18B futures_task 0.0% 0.0% 18B byte_slice_cast 0.0% 0.0% 18B futures_util 0.0% 0.0% 17B glib_sys 16.9% 100.0% 130.8KiB .text section size, the file size is 773.6KiB
As expected the standard library is the biggest culprit here.
The plugin we used for our experimentations was very minimal. It would be interesting to look at the sizes of actual real Rust GStreamer plugins. We therefore built all the gst-plugins-rs plugins using the same build settings:
plugin | size (bytes) | size (human) | stripped size (bytes) | stripped size (human) |
---|---|---|---|---|
libgstcdg.so |
2876960 | 2.8M | 334208 | 327K |
libgstclaxon.so |
2795840 | 2.7M | 354656 | 347K |
libgstfallbackswitch.so |
2964136 | 2.9M | 412000 | 403K |
libgstgif.so |
2793224 | 2.7M | 342392 | 335K |
libgstlewton.so |
2985256 | 2.9M | 420192 | 411K |
libgstrav1e.so |
4511504 | 4.4M | 1571208 | 1.5M |
libgstreqwest.so |
6762648 | 6.5M | 3230480 | 3.1M |
libgstrsaudiofx.so |
815104 | 796K | 223408 | 219K |
libgstrsclosedcaption.so |
3447056 | 3.3M | 741240 | 724K |
libgstrsdav1d.so |
2748928 | 2.7M | 313752 | 307K |
libgstrsfile.so |
1403832 | 1.4M | 739592 | 723K |
libgstrsflv.so |
1007672 | 985K | 321712 | 315K |
libgstrusoto.so |
7412336 | 7.1M | 3734024 | 3.6M |
libgstsodium.so |
3050656 | 3.0M | 572432 | 560K |
libgstthreadshare.so |
4530280 | 4.4M | 1448376 | 1.4M |
libgsttogglerecord.so |
3012008 | 2.9M | 436552 | 427K |
It's interesting to notice that most plugins stay in the few kilobytes range with some notable exceptions. The plugins reaching the megabyte(s) size seem to be the ones relying on big Rust crates such as rav1e or reqwest. Those are "pure" Rust elements as they don't rely on external C libraries to actually process the data, like C plugins generally do.
The AV1 encoder and decoder are a good example here. The former, libgstrav1e.so
, uses the rav1e crate which is also written in Rust and so is statically linked with the plugin. On the other hand, libgstrsdav1d.so
wraps the dav1d C decoder to which it's dynamically linked to, so the actual decoding code isn't accounted in the plugin size.
So far we only considerd x86_64
binaries, however embedded devices are generally based on ARM
SoC. We were interested in comparing the size of Rust plugins when built for this architecture, and wondered if we would observe any significant difference.
We therefore rebuilt all the plugins using the armv7-unknown-linux-gnueabihf
toolchain as we would do to build for the Raspberry Pi, for example.
plugin | size (bytes) | size (human) | stripped size (bytes) | stripped size (human) |
---|---|---|---|---|
libgstcdg.so |
2810512 | 2.7M | 251460 | 246K |
libgstclaxon.so |
2815844 | 2.7M | 263732 | 258K |
libgstfallbackswitch.so |
2893712 | 2.8M | 321076 | 314K |
libgstgif.so |
2820696 | 2.7M | 255556 | 250K |
libgstlewton.so |
2912520 | 2.8M | 316980 | 310K |
libgstrav1e.so |
4376676 | 4.2M | 1287752 | 1.3M |
libgstreqwest.so |
6213712 | 6.0M | 2336548 | 2.3M |
libgstrsaudiofx.so |
902320 | 882K | 165424 | 162K |
libgstrsclosedcaption.so |
3347412 | 3.2M | 563580 | 551K |
libgstrsfile.so |
1348440 | 1.3M | 501248 | 490K |
libgstrsflv.so |
1094828 | 1.1M | 243248 | 238K |
libgstrusoto.so |
6818928 | 6.6M | 2754168 | 2.7M |
libgstsodium.so |
3026284 | 2.9M | 435892 | 426K |
libgstthreadshare.so |
4419844 | 4.3M | 1132128 | 1.1M |
libgsttogglerecord.so |
2954476 | 2.9M | 337452 | 330K |
We notice here that ARM
binaries are slightly lighter than their x86_64
equivalents and the gain from stripping is very similar on both architectures.
We have to keep in mind that each size reduction technique comes at a cost: binaries that are less debug friendly, higher build times, etc. Depending of our actual needs and constraints, one needs to consider the tradeoff between ease of debugging and binary size.
It's also important to note that we considered only a single Rust plugin in our setup. The total size would grow rapidly if we would have to ship multiple Rust plugins as each one would statically ship the GStreamer and GLib Rust glue code. We'll discuss and analyze in a future blog post the options to reduce the total size in such multi-plugins scenarios such as linking all just elements into a single larger Rust plugin so they can share common code.
Based on this research, we think that Rust is ready to deploy in embedded systems with limited memory resources. Rust brings numerous benefits to embedded systems, in particular, it's as fast as C/C++ but offers zero-cost abstractions, and advanced memory safety that enable rapid development and enable easier multi-threaded programming and fearless concurrency. As the GStreamer community is embracing the Rust language for its memory safety while handling untrusted multimedia data, Collabora is happy to help you bring Rust to your embedded projects.
03/12/2024
this is a test post
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
26/06/2024
WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…
Comments (2)
dbdr:
Apr 29, 2020 at 05:58 AM
Reqwest depends on tokio and the whole async stack. If the plugin only needs a single (or a few) HTTP requests at the same time, using instead a lightweight HTTP client library like attohttpc would probably provide huge savings.
Reply to this comment
Reply to this comment
Guillaume Desmottes:
Apr 29, 2020 at 02:11 PM
Indeed, it would be nice to have another plugin using such lighter http crate so users can pick the one fitting best for their use case.
Feel free to try writing one if you're interested contributing to gst-plugins-rs. :)
Reply to this comment
Reply to this comment
Add a Comment