We're hiring!
*

Oxidizing bmap-tools: rewriting a Python project in Rust

Rafael Garcia Ruiz avatar

Rafael Garcia Ruiz
March 03, 2023

Share this post:

Reading time:

Back in September, I joined Collabora as an intern to work on Rust-related projects for six months. It's been a great experience and I would recommend it to anyone who is passionate about FOSS and wants to work in an inclusive environment with very skilled and supporting people!

Over the internship, my goal was to continue to "oxidise" bmaptool, a tool for creating the block map (bmap) for a file and copying files using the block map. Oxidising means to rewrite some code, in this case a project written in Python, into Rust code. Usually a project is oxidised into Rust because of many reasons, the main usually being memory safety. Mozilla has an interesting article on Oxidation if you would like to learn more about general reasons of why Rust is so great.

Of course, rewriting a project in Rust is pointless without good reasons: if a solution to a problem already exists, there should be no need to rewrite it :-). With this project, the main goal was to remove Python dependencies and instead create a statically linked binary which should save disk space and in future allow the bmap sparse file format to be used in other Rust projects. Another reason for the project was for me to gain experience in some more advanced Rust topics and have some fun during the process!

We decided to call the new project bmap-rs and host the source code on GitHub under a permissive open-source licence to allow the wider community to benefit from the project and make contributions easier.

More than just copying

bmaptool is a generic tool for flashing sparse images to a block device or file using a custom file format called bmap. The idea is that large files, like raw system image files, can be copied or flashed a lot faster and more reliably with bmaptool than with traditional tools, like dd or cp because the bmap file format allows you to only flash the used parts of the system image and also verifies each written block. The tool's main use is to flash system images into block devices, but it can also be used for general image flashing purposes. The feature we were mainly interested in was the copy subcommand:

bmaptool copy <input> <output>

The input parameter can be a local file or a remote URL, the output parameter can be a local file or a block device. We wanted bmap-rs to be able to execute that particular job in the short term, even though having other features of bmaptool would also be useful later. Then the goal was to be able to execute the following command, with the same functionality as the original project:

bmap-rs copy <input> <output>

As the project had already been started by a colleague, the first step for me would be to import the existing project into GitHub and prepare it as an open source project: setup a CI pipeline to make sure things build, a licence file, correct README and everything else needed for it to be open to contributions. For that moment on I handled bug reports, feature requests, pull-requests and updating dependencies as one of the team. From this I learnt about the never-ending responsibilities of maintaining open-source software!

The development roadmap of the internship was as follows:

  • Parse the XML .bmap format from a local file

  • Stream image contents from an HTTP/HTTPS source

  • Load the bmap file from the same HTTP/HTTPS source as the image

  • Stream a gzip compressed image file over HTTPS and verify the written checksums against the parsed bmap

  • Write the image contents to a normal file

  • Write the image contents to a block device (e.g. SD card or USB flash drive)

Bmap file

But what exactly is a bmap file and why is it so useful for this purpose? Well, it is an XML file which contains a list of mapped areas plus some additional information about the file it was created from. For example:

  • SHA256 checksum of the bmap file itself

  • SHA256 checksum of the mapped areas

  • the original file size

  • amount of mapped data

Having each mapped area's checksum, once each part is copied to the destination we can check that the information has been copied correctly and not corrupt. Having the data mapped allows to avoid reading or copping "holes", meaning a bunch of zeroes, which allows us to only copy the parts of the image which are used. Here's an example of a bmap file.

Remote input

Even if there was already an initial project containing the copying algorithm for local copy, it wasn't able to write into block devices or copy remote files. Allow copying into block devices turned out to be a simple fix. But on the other hand, allowing remote input was a bigger issue. To allow an HTTP request from the code, it had to be able to wait for the response so we needed to create an asynchronous context for that feature. At the same time, it also needed to fetch the bmap file remotely and accept a URL as input argument on the command line.

Other enhancements have been made along the way, like the implementation of a progress bar and the ability to copy an image without using a bmap file, which can be useful in cases where you have an image without holes. These features are not required but would most likely improve the experience of using it and allow bmaptools to be fully replaces. Finally, we published the crates on Crates.io, Rust's package registry. The crates bmap-rs and bmap-parser have been published and are now ready for anyone to use them and try them out!

What's next?

The intended context for this to work was to integrate it into the tests which run on real hardware in Collabora's LAVA lab. Some tests boot a minimal Linux system from a network filesystem, then use bmaptool to flash an image to the target block device, for instance the SD card or eMMC. The device can then reboot into the flashed filesystem and run tests on the image.

Using bmaptool in this way increases the size of the NFS image since it includes the Python runtime and other libraries. In comparison, using Rust allows us to generate a small statically-linked binary to do the same and even offers the ability to make further improvements in future, for instance booting to a EFI binary to complete the flashing rather than booting a complete Linux system.

Once it's integrated with LAVA it will result in an efficiency enhancement across all projects that use bmap files, resulting in a benefit for other teams in Collabora. Knowing that is the most rewarding feeling about this achievement.

There are still some features of bmaptools that could be interesting for bmap-rs to have like the create command for generating the bmap of a file and implementing some more decompression algorithms.

Wrapping Up

During the internship, I've had to constantly learn new skills and challenge myself. For the first time I've acted as the maintainer of a project, keeping it up to date and managing it using an open-source open-first philosophy. I've learned to use Rust from scratch and ended using some advanced features of the language including async features. Participating in the development of bmap-rs and acting as it's maintainer during this time has allowed me to improve on my Rust skills and overall open source contributing abilities and confidence.

This experience has also helped me to gain knowledge about the profession itself. I feel more oriented towards what kind of engineer I want to become, which areas do I intend to investigate more and which abilities do I want to obtain in the future. I see clearer than ever that I want my work to be oriented towards Open Source, so it can be reused and shared, helping many others. Likewise, I'm looking forward to finishing my degree and rejoin the team more equipped to make a better impact.

I'm really grateful to my mentor Christopher Obbard and also Gustavo Noronha. Their implication and support during this experience has helped me a lot. I appreciate a lot how Sjoerd Simons and Ryan Gonzalez has review my code with their Rust language experience and knowledge. I'm sure all I've learned during this internship is going to help me make a better impact with my future contributions to open source and seeing how my work can be useful to other people really gave me a sense of fulfilment.

Comments (2)

  1. Mikko:
    Mar 04, 2023 at 02:14 PM

    "Here can be a local or remote file and can be a file or a block device" seems to be missing < input > and output tags.

    Reply to this comment

    Reply to this comment

    1. Christopher Obbard:
      Mar 06, 2023 at 01:51 PM

      Thank you, we have updated the blog post.

      Reply to this comment

      Reply to this comment


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

test post

03/12/2024

this is a test post

Mesa CI and the power of pre-merge testing

08/10/2024

Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…

A shifty tale about unit testing with Maxwell, NVK's backend compiler

15/08/2024

After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…

A journey towards reliable testing in the Linux Kernel

01/08/2024

We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…

Building a Board Farm for Embedded World

27/06/2024

With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…

Smart audio filters with WirePlumber 0.5

26/06/2024

WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Limited © 2005-2024. All rights reserved. Privacy Notice. Sitemap.