Tools for automatic creation of Software Bill of Materials (SBOM)

Tim Drobny

In times, where software develops at a rapid pace, there is little time to write each component of code yourself. That is why libraries and other tools alike exist – to make our lives easier and to speed up the development process. But how can we keep an overview over all the components we use? After all, the libraries and packages we use come with dependencies we mostly do not know of. Checking our requirements.txt file is not enough as it does not keep track of dependencies and sometimes not even version numbers. Python is just an example here, the problem persists through other languages as well.

This is why SBOMs were created. To aid with this problem and increase awareness for packages and their licensing as well as their security.

What is an SBOM?

SBOMs have been around since about 2018, so they are quiet new in terms of information technology. You can consider it a building block for supply chain risk management. The supply chain is everything: every tool, library or package that touches the software or plays any role during its lifecycle.[1]

An SBOM serves as a list of “ingredients” that make up software components. They include critical information about libraries, tools and processes used in development and building the software.[2]

Why should you use an SBOM?

Nowadays software is developed at a rapid pace and developers, include code from open source repositories and proprietary packages, to speed up the development and you wouldn’t want to reinvent the wheel each time you need a piece of software that is already written. Using these tools is quiet helpful as it saves time and by that money for the companies. But, all these components can contain weaknesses and general security risks that the developers might not know about. A risk report by synopsis released in 2024, consolidated findings from over 1000 open source codebases across 17 industries in 2023. They found that 96% of the total codebases contained open source code and 84% of codebases contained vulnerabilities. Looking back to 2021, the Log4Shell vulnerability inside the Log4j logging framework, affected numerous companies around the world. Only one version was vulnerable but it took companies some time to figure out what version they were running. An SBOM could have helped here as it keeps track of version numbers as well.[2]

Security issues always come with cost attached. Either to fix them prematurely or to fix them after a breach occurred. This of course also brings damage to the company image, that needs to be avoided. This is why all dependencies, images and infrastructure used, need continuous checking for vulnerabilities and weaknesses. An SBOM serves as a great overview over exactly these things. It gives you insight into all components to look for vulnerabilities and licenses, that do not comply with internal and external policies.[2]

Tools for creating an SBOM

There are two popular types of data exchange standards used in SBOMs. There is the CycloneDX format by OWASP. It emphasizes on security vulnerabilities. It can be formatted in json or xml. SPDX by the Linux foundation focuses more on software licenses.[3]

CDXSPDX
Open sourceOpen source
OWASPLinux foundation
lightweight and focus on vulnerabilitiesensure compliance and transparency
focus on ease of adoption and automation
[4]

Comparing Tools

In order to compare both tools, I created a small python application that creates a webserver and accesses the GitHub API:

import requests
from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/')
def home():
    return "Hello, SBOM"

@app.route('/api')
def api():
    response = requests.get('http://api.github.com')
    return jsonify(response.json())

if __name__ == '__main__':
    app.run(debug=True)

When this code is executed, it starts a local webserver on port 5000 and allows you to browse to a window saying “Hello, SBOM” and to view the GitHub API.
The requirements.txt contains the following:

Flask
requests

CycloneDX

CycloneDX supports a multitude of languages (even Haskell) and can scan your code either locally or as a part of a CI/CD pipeline. It even has an API server for on demand checks of code. To use CycloneDX for python, there is a pip package called cyclonedx-bom (This is similar for other languages; maven has a CycloneDX plugin to create SBOMs during a build). With its command you can generate a multitude of different SBOMs for your python project. See the documentation for more info.[5]

First of all, i created an SBOM using CycloneDX, by firstly installing the pip package:

pip install cyclonedx-bom

And then creating the SBOM using the following command:

cyclonedx-py environment > ./CycloneDX/environment.json

This gives us the SBOM for out entire environment in CDX format and saves it in the environment.json file.

The syntax here is the following:

"components": [
	{
		"bom-ref": "Flask==3.0.3",
		"description": "A simple framework for building complex web applications.",
		"externalReferences": [
			{
				"comment": "from packaging metadata Project-URL: Chat",
				"type": "chat",
				"url": "https://discord.gg/pallets"
			},
			{
				"comment": "from packaging metadata Project-URL: Documentation",
				"type": "documentation",
				"url": "https://flask.palletsprojects.com/"
			},
			{
				"comment": "from packaging metadata Project-URL: Source",
				"type": "other",
				"url": "https://github.com/pallets/flask/"
			},
			{
				"comment": "from packaging metadata Project-URL: Donate",
				"type": "other",
				"url": "https://palletsprojects.com/donate"
			},
			{
				"comment": "from packaging metadata Project-URL: Changes",
				"type": "release-notes",
				"url": "https://flask.palletsprojects.com/changes/"
			}
		],
		"licenses": [
			{
				"license": {
					"name": "License :: OSI Approved :: BSD License"
				}
			}
		],
		"name": "Flask",
		"purl": "pkg:pypi/flask@3.0.3",
		"type": "library",
		"version": "3.0.3"
	},
<SNIP>

As we can see, the overview for a single package is quiet detailed. The first block shows general information. In the second and third, we get a ton of external references, like github or discord links as well as licensing information. The final block then contains other metadata. This is of course so detailed, because Flask is a very popular package to use for writing web backends. Other packages that are not as popular, will not contain such detailed information and certainly not as much.

If we scroll all the way down in the file, we can even see dependencies. So which package depends on which:

<SNIP>
"dependencies": [
	{
		"dependsOn": ["Jinja2==3.1.4", "Werkzeug==3.0.3", "blinker==1.8.2", "click==8.1.7", "itsdangerous==2.2.0"],
		"ref": "Flask==3.0.3"
	},
	{
		"dependsOn": ["MarkupSafe==2.1.5"],
		"ref": "Jinja2==3.1.4"
	},
	{
		"ref": "MarkupSafe==2.1.5"
	},
	{
		"dependsOn": ["MarkupSafe==2.1.5"],
		"ref": "Werkzeug==3.0.3"
	},
<SNIP>

We can see that Flask itself depends on 5 other packages. They even include the version numbers for our case. Some packages have no dependencies, like MarkupSafe.

Syft

Syft can create SBOMs in multiple formats, including CDX and SPDX. It is also a CLI tool and can scan containers as well as file systems. You can also create an SBOM using Syfts own format.[6]

I installed Syft inside of a WSL machine locally with the provided command.[6]

Using Syft, I created a multitude of SBOM files. One in CycloneDX format, one in SPDX format, one in Syfts own format, and one in human readable text form.

They all utilize the same syntax for that:

syft . -o [format] > [output file]

Syft then creates SBOMs like this:

CyclonDX:

{"$schema":"http://cyclonedx.org/schema/bom-1.6.schema.json","bomFormat":"CycloneDX","specVersion":"1.6","serialNumber":"urn:uuid:870e8bd0-e86b-48ed-a703-a94c1d4e0f8d","version":1,"metadata":{"timestamp":"2024-07-21T19:28:56+02:00","tools":{"components":[{"type":"application","author":"anchore","name":"syft","version":"1.9.0"}]},"component":{"bom-ref":"af63bd4c8601b7f1","type":"file","name":"."}},"components":[{"bom-ref":"pkg:pypi/flask@3.0.3?package-id=9a5e3b2e12b775ed","type":"library","name":"Flask","version":"3.0.3","cpe":"cpe:2.3:a:palletsprojects:flask:3.0.3:*:*:*:*:*:*:*","purl":"pkg:pypi/Flask@3.0.3","properties":[{"name":"syft:package:foundBy","value":"python-installed-package-cataloger"},{"name":"syft:package:language","value":"python"},{"name":"syft:package:type","value":"python"},{"name":"syft:package:metadataType","value":"python-package"},{"name":"syft:location:0:path","value":"/testApp/Lib/site-packages/flask-3.0.3.dist-info/METADATA"}
<SNIP>

This is definitely not human readable. Except if you run the output through a tool to reformat it. All the outputs look very similar to this:

SPDX:

{"spdxVersion":"SPDX-2.3","dataLicense":"CC0-1.0","SPDXID":"SPDXRef-DOCUMENT","name":".","documentNamespace":"https://anchore.com/syft/dir/b9dc3c0c-0e97-4ba7-ba77-a0849de48ec8","creationInfo":{"licenseListVersion":"3.24","creators":["Organization: Anchore, Inc","Tool: syft-1.9.0"],"created":"2024-07-21T17:29:29Z"},"packages":[{"name":"Flask","SPDXID":"SPDXRef-Package-python-Flask-9a5e3b2e12b775ed","versionInfo":"3.0.3","supplier":"NOASSERTION","downloadLocation":"NOASSERTION","filesAnalyzed":false,"sourceInfo":"acquired package info from installed python package manifest file: /testApp/Lib/site-packages/flask-3.0.3.dist-info/METADATA, /testApp/Lib/site-packages/flask-3.0.3.dist-info/RECORD","licenseConcluded":"NOASSERTION","licenseDeclared":"NOASSERTION","copyrightText":"NOASSERTION",<SNIP>

SYFT Format:

{"artifacts":[{"id":"9a5e3b2e12b775ed","name":"Flask","version":"3.0.3","type":"python","foundBy":"python-installed-package-cataloger","locations":[{"path":"/testApp/Lib/site-packages/flask-3.0.3.dist-info/METADATA","accessPath":"/testApp/Lib/site-packages/flask-3.0.3.dist-info/METADATA","annotations":{"evidence":"primary"}},{"path":"/testApp/Lib/site-packages/flask-3.0.3.dist-info/RECORD","accessPath":"/testApp/Lib/site-packages/flask-3.0.3.dist-info/RECORD","annotations":{"evidence":"supporting"}}],"licenses":[]<SNIP>

As you can see, all these outputs are more machine than human readable.

Syft does provide a human readable format like this:

[Path: .]
[Flask]
 Version:	 3.0.3
 Type:		 python
 Found by:	 python-installed-package-cataloger

[Jinja2]
 Version:	 3.1.4
 Type:		 python
 Found by:	 python-installed-package-cataloger
<SNIP>

But there is a lot of information missing here, like licenses or dependencies.

Jake

Jake is a tool specific to python. You can either generate an SBOM or a report on known vulnerabilities of used components. It is also installed via a pip package and can scan your local directory.

jake sbom --output-format json -o Jake/sbom.json

With this command we can generate an SBOM and save it to sbom.json.

One immediate problem is once again the readability. It seems that human readability is more the exception than the rule.

{"$schema": "http://cyclonedx.org/schema/bom-1.4.schema.json", "bomFormat": "CycloneDX", "specVersion": "1.4", "serialNumber": "urn:uuid:84961009-c747-4ace-968f-167284c4e87f", "version": 1, "metadata": {"timestamp": "2024-07-23T19:51:44.666349+00:00", "tools": [{"vendor": "CycloneDX", "name": "cyclonedx-python-lib", "version": "3.1.5", "externalReferences": [{"url": "https://github.com/CycloneDX/cyclonedx-python-lib/actions", "type": "build-system"}, {"url": "https://pypi.org/project/cyclonedx-python-lib/", "type": "distribution"}, {"url": "https://cyclonedx.github.io/cyclonedx-python-lib/", "type": "documentation"}, {"url": "https://github.com/CycloneDX/cyclonedx-python-lib/issues", "type": "issue-tracker"}, {"url": "https://github.com/CycloneDX/cyclonedx-python-lib/blob/main/LICENSE", "type": "license"}, {"url": "https://github.com/CycloneDX/cyclonedx-python-lib/blob/main/CHANGELOG.md", "type": "release-notes"}, {"url": "https://github.com/CycloneDX/cyclonedx-python-lib", "type": "vcs"}, {"url": "https://cyclonedx.org", "type": "website"}]}, {"vendor": "Sonatype Nexus Community", "name": "jake", "version": "3.0.14", "externalReferences": [{"url": "https://app.circleci.com/pipelines/github/sonatype-nexus-community/jake", "type": "build-system"}, {"url": "https://pypi.org/project/jake/", "type": "distribution"},
<SNIP>

But we once again get a lot of meta data at the beginning of the file and licensing information. The output seems very similar to the one of Syft but, we can notice a few differences. The tool relies on the Sonatype OSS Index, which is a catalogue for open source components and scanning tools.[7][8]

Conclusion

Which SBOM tool you use is entirely up to you. Most are well documented and easy to implement. If you need it to be human readable and not just machine readable, you should have another tool at hand to reformat some of the outputs. But, all three tools find the same things and provide very similar output. Notably, Syft provides more metadata than CycloneDX, like timestamp, format, version of Syft and more. CycloneDX focuses more on the important part. It does not give as much metadata than Syft but gives the information on components and dependencies in a much more readable and clearer way. Jake has more similarities with Syft than CycloneDX, in terms of metadata and readability. But the ability to also generate an automated risk report can be quiet helpful.

So, should you use SBOMs in your Projects? Yes, definitely. The amount of work you have to put into creating an SBOM or including it into your build pipeline is way less, than trying to figure out all dependencies/versions yourself.

It also increases security awareness. By looking through your SBOM you can spot vulnerable versions way easier.

You can also ensure compliance as they provide a great overview over licensing information.

[1] https://www.synopsys.com/glossary/what-is-software-supply-chain-security.html

[2] https://about.gitlab.com/blog/2022/10/25/the-ultimate-guide-to-sboms

[3] https://www.wiz.io/academy/top-open-source-sbom-tools

[4] https://scribesecurity.com/blog/spdx-vs-cyclonedx-sbom-formats-compared/

[5] https://github.com/CycloneDX/cyclonedx-python

[6] https://anchore.com/sbom/how-to-generate-an-sbom-with-free-open-source-tools/

[7] https://ossindex.sonatype.org

[8] https://github.com/sonatype-nexus-community/jake


Posted

in

by

Tim Drobny

Tags:

Comments

Leave a Reply