From Open Source Patch to Distribution in Four Steps

author-image

By

Photo by Dinh Pham on Unsplash

Open source software is everywhere, from the smartphones we scroll to the websites we browse to the cars we drive. By definition, anyone can use, study, change and distribute it for any purpose. It's often a collaborative and transparent process with developers from different companies working together to add new features or fix bugs. Yet few people understand how software gets accepted into these projects. Our aim here is to provide an easy-to-understand overview of how it works, without delving into every exception. We hope to offer a basic understanding of the different stages of open source software development and the value of using specific tools from a distribution instead of retrieving it from GitHub*, although both approaches are acceptable. For a concrete example, we'll look at a toolkit developed by Intel as open source software that supports a particular feature provided as part of a CPU.

Step One: Open Source Development

A lot of software is developed as open source first. That allows the community to see code early on, review it throughout the development process and, most importantly, contribute. Previously, most software development was a closed process, leaving users with no choice but to submit support requests and persuade product managers to consider and include their requests in the final product. Of course, the same applied to bugs. The customer had to file the bug with the vendor and wait for it to be fixed eventually. Open source software is very different. Anyone with the right skills can make changes to the code, enhance, or fix it. After completing their changes, they initiate a pull request. This request is assessed by the repository maintainer, and if approved, the revised code is integrated into the upcoming version. That’s the main principle behind tools like GitHub that are commonly used to manage contributions from multiple developers to open source projects.

Let's return to the original question: How are patches made? The development team merges code changes regularly until they reach a point where they release a new version of the toolkit. This new version is made available for download in either source code or pre-compiled format for popular Linux* flavors. Depending on their technical ability, users can either compile the source code or use the installer. The installer is usually available in a deb file for Debian*-based systems like Ubuntu* or a Red Hat* Package Manager (RPM) file for RedHat/CentOS* or SUSE*-based systems.

Packages are complete versions of the software that can be used without further development, but if you're not able to contribute and develop on your own, you'll typically only have access to support from the community. However, if a commercial company has a business model that offers commercial support, they may provide maintenance for the solution.

Step Two: Upstreaming

Upstreaming is the next step in open source software development. This typically describes the process of integrating larger projects, which can be challenging due to the complexity of managing different versions and speeds of development. This becomes particularly important when it comes to regression testing. The concept of upstreaming is most used in the context of the Linux operating system, particularly the Linux Kernel, compiler, and key user-land libraries, including crypto. These components must be tightly integrated, and extensive regression testing is necessary to prevent new features or fixes from breaking something else.

To achieve this integration, key OS-related projects are organized into upstreaming projects, which consist of different subprojects. The structure and management of these projects can be explored on kernel.org, which lists all available GitTrees and their owners. The software development process is typically collaborative, with developers contributing code to a Git branch while owners manage pull requests with the broader team to merge contributions into a new version. This process involves building stable subsystems before creating a new stable version.

Two key people are responsible for maintaining the core of the Linux system: Linus Torvalds and Greg Kroah-Hartmann. They integrate various components and make crucial decisions on general availability and long-term support. This process is transparent with open communication through public mailing lists. Anyone can contribute to Linux by developing new features, fixing bugs, testing, or writing documentation. Quality assurance work includes reviewing code submitted by others, promoting collaboration and knowledge-sharing among diverse backgrounds. Quality assurance work includes reviewing others' code, which can be uncomfortable. However, this knowledge sharing leads to collaborative improvement. Peer reviews lead to better and more secure code from the start, benefiting the entire project.

Developers also collaborate outside company boundaries to improve the end-result for all, often supported by companies that believe in the power of open source.

Step Three: Commercial Linux Distribution

After exploring how Linux core components are managed and created, we now have a complete system available for free download and use. However, there's another player in the game: Linux distributions. These distributions aim to make the upstream code accessible to a wider audience.  

For many users, the convenience of a Linux distribution that offers an installation mechanism alongside pre-compiled packages surpasses the alternative method of cloning code from GitHub, making modifications, and then compiling it for their own computers.

Linux distributions offer more than just pre-compiled packages. They also provide additional software, such as window managers and productivity tools like office suites and email programs. The key value of a distribution is that the company or project compiles all the software components they consider important and runs integration tests to ensure a great user experience. This allows users to easily pull software from repositories and use the code without worrying about dependencies or compatibility issues.

Because open source projects allow users to choose whatever libraries or tools they need, distributions have a hard time to keeping up with upstreaming projects. For example, while the Linux kernel may already be available in version 6.2, the distribution may still deliver version 5.x as part of their distribution.
This common practice presents a challenge since the latest features and hardware enablement may not be available in older versions. To address this issue, distributors offer feature backporting as part of their value proposition. Their goal is to provide their customers with access to the most current technology and standards by incorporating the kernel version from their distribution, even if it’s not the latest version being developed upstream.

The backporting process is based on version control mechanisms on GitHub. This allows developers to review pull requests and easily identify changes made to enable specific features. Then pull requests can be merged into the distribution's codebase, but the closer the versions of the base code are to each other, the easier the process.

However, many distributions don’t use the full upstream code because they often need to tweak components, including the kernel, to match their specific needs. This can include supporting certain hardware or performing performance and security tuning. These tweaked versions may even have their own unique version numbers.

As a customer, it's crucial to find a distribution that meets your specific needs, which can vary greatly since different distributions are designed for different purposes. Some are geared towards generic desktop computing, while others focus on mathematical applications or container and server-based workloads. However, depending on the criticality of a company's workloads, it may not be feasible to rely on a foundation that doesn't offer service level agreements. That leaves two options: either assemble a team to maintain a distribution or purchase support from a company that manages the code.

The value of commercial Linux distributions is in ensuring that packages are thoroughly tested and integrated with a common code base. 
Distributions bear the key responsibility of addressing any security issues and ensuring seamless updates for users. Any changes undergo thorough regression testing before being shipped to clients.

It's worth noting that Linux distributors actively participate in contributing their code and fixes to the Linux project. In fact, some of the most innovative solutions were born out of a distributor's specific use case, where they invested heavily in the code and later shared it with the community.

Collaboration is a key component of the Linux project. Participants come together to resolve issues and compete for business opportunities, but the guiding principle is always the project's collective benefit. 

Another benefit of Linux distributions is that they provide long-term support (LTS) versions, which offer longer support than the open source project itself can provide. This is especially important for critical infrastructure or use-cases that require a stable codebase over an extended period. For example, enterprise desktops may not need the latest features but do require a stable system, and embedded or IoT systems may not be able to take every possible patch. With LTS releases, customers can stay on a stable foundation and continue to develop and test their own software while the distribution provides fixes for the operating system. This limits the scope of testing and allows for more thorough testing of the stack instead of testing various permutations.

The breadth and depth of these needs are why there are so many Linux distributions. Most users lack the knowledge or skills to maintain their own Linux system and handle various aspects, from building the kernel and keeping up with security problems or the latest hardware driver.

Step Four: In Practice

Finally, let's look at a specific example of how a feature moves from upstream to distribution. We'll use the example of a driver for Intel graphics. Intel is a major contributor to the Linux community in terms of silicon and hardware technology. When a new hardware feature is introduced, it typically requires software enablement through a driver to connect the software running on the machine, the operating system libraries, and the kernel. This ensures that the latest features can be fully utilized, in addition to simply enabling the hardware itself. 

For Intel's recent graphic cards, internal teams typically develop the driver software. For Windows-based systems, the software is finished and validated with Microsoft through test programs. Meanwhile, Intel engineers develop the required code for Linux on GitHub, which is subsequently submitted to the Linux Kernel project. While code review and feedback may require a few iterations, the goal is for the feature to be included in the upstream kernel.

However, being part of the Linux project doesn't guarantee full support of certain features in popular distributions, for our example we’ll use Ubuntu. Because the Intel driver is open source, Canonical* can evaluate the demand for the driver in their customer base and determine a timeline for providing it.

If customers do want to use DG2 drivers on their Ubuntu deployments, Canonical has two options. The first option is to wait until the driver is fully accepted by upstreaming, meaning it’s fully integrated into the upstreaming kernel. This has the advantage of a comprehensive code review and thorough testing. After upstreaming, Canonical can take the code, isolate the changes then integrate them into the existing Ubuntu kernel. Typically, this is done initially for only one version to gain customer feedback on viability and user experience.

The more complex option is to use patches directly from GitHub, also known as out-of-tree patches. This carries a risk that the community review process may require significant changes to the code, such as API changes or fundamental project changes. Distributors commit to maintaining code for a certain period, so they need the skills or support from Intel to move out-of-tree code into the distribution's code stream. However, there is also the risk that unreviewed code may affect other components for which the distributor is responsible.

Ultimately, distributors aim to balance supporting new hardware and providing the best customer experience while maintaining a stable distribution. Ideally, out-of-tree code is close to upstream acceptance and adds significant value to the distribution. Collaborating with Intel can help mitigate risks, as with the Ubuntu IoT images optimized for Intel. The discussion in the open source community revolves around finding the right balance between early enablement of a feature and full upstream acceptance. Nonetheless, customers always have a choice, from using a pre-existing product to maintaining their own codebase. That’s the beauty of open source software!

About the Author

Christian Holsing is a Technical Account Manager on the OSV Team at Intel.