The System Package Data Exchange (SPDX®) specification is an open standard designed to facilitate the communication of Bill of Materials (BOM) information across diverse domains, including software, artificial intelligence (AI), datasets, and system components. SPDX enables organizations to document, share, and manage metadata critical to understanding and maintaining software supply chains, ensuring transparency, compliance, and security.
What is SPDX?
SPDX is a collaborative effort driven by the Linux Foundation and supported by a global community of developers, organizations, and industry experts. By adopting SPDX, you can contribute to building a more transparent, secure, and efficient software ecosystem.
SPDX provides a standardized framework for creating and exchanging detailed metadata about system components, their relationships, and associated information. It defines an underlying data model and supports multiple serialization formats, enabling interoperability across tools, platforms, and industries. Originally focused on software licensing, security, and composition, SPDX 3.0 (a major revision to SPDX 2.2.1, aka free ISO/IEC 5962:2021 – SPDX® Specification V2.2.1) has expanded to cover broader areas such as AI models, datasets, and system lifecycle information such as build information.
Key Features of SPDX 3.0
SPDX 3.0 introduces significant enhancements to support the evolving needs of modern software ecosystems and related domains. Key features include:
Software Composition:
Metadata for collections of software (Packages), individual Files, and portions of files (Snippets).
Detailed information about dependencies, bundled components, and optional elements.
Software Build Information:
Documentation of build processes, tools, and configurations used to create software artifacts.
Artificial Intelligence (AI) Models:
Support for describing AI models, including their training datasets, provenance, and associated metadata.
Datasets:
Metadata for datasets used in software development, AI training, or other applications, including licensing, provenance, and integrity.
Creator, Supplier, and Distributor Identity:
Information about the entities involved in creating, supplying, and distributing software or system components.
Provenance and Integrity:
Tracking the origin and history of components, including checksums and cryptographic hashes to ensure integrity.
Licenses and Copyrights:
Comprehensive licensing information, including:
A curated list of SPDX license identifiers and exceptions.
License expressions for multi-license scenarios.
Copyright notices and statements.
Security Vulnerabilities and Quality Data:
Integration of security vulnerability data, defect reports, and other quality-related information to support risk assessment and mitigation.
Relationships Between System Elements:
Explicit relationships between components, such as dependencies, inclusion, or exclusion, enabling detailed modeling of complex systems.
Software Usage and Lifecycle:
Metadata describing how software is used, maintained, and retired, including lifecycle stages and usage policies.
Annotations and Linking:
Mechanisms to annotate SPDX elements with additional information and link between multiple SPDX documents for distributed systems.
Why Use SPDX?
SPDX provides a robust solution for managing BOMs and related metadata, offering the following benefits:
- Transparency: Gain visibility into the composition, provenance, and licensing of software and system components.
- Compliance: Simplify the process of managing licensing obligations, copyright notices, and security vulnerabilities.
- Interoperability: Standardize communication across tools, organizations, and industries, reducing friction in the software supply chain.
- Automation: Enable tools to generate, validate, and consume SPDX documents for faster and more accurate workflows.
- Scalability: Support complex systems with diverse components, including software, AI models, and datasets.
- Integration: Connect graphs of bill of materials information with other graphed types for analysis.
SPDX Formats
SPDX documents can be serialized in multiple formats to suit different use cases:
- JSON-LD: A lightweight, JSON-based format for representing Linked Data, making it easy to integrate structured data into web applications while being both human-readable and machine-processable.
- Turtle (Terse RDF Triple Language): A user-friendly, compact format for writing RDF data, using a simple syntax to represent subjects, predicates, and objects in a way that’s easy to read and edit.
- N-Triples: A simple, line-based format for representing RDF data as individual statements, where each line consists of a subject, predicate, and object, making it easy to read and process.
- RDF/XML: An XML-based format for representing RDF data, allowing structured information about resources and their relationships to be stored and shared in a machine-readable way.
SPDX and Industry Standards
SPDX 3.0 aligns with global standards for BOMs and software supply chain security, including:
- NIST Guidelines: Supporting SBOM requirements outlined by the National Institute of Standards and Technology.
- ISO/IEC Standards: Ensuring compatibility with international standards for software and system documentation.
- OpenSSF Initiatives: Enhancing security and transparency in open source software supply chains.
The specification is also endorsed by the Object Management Group (OMG), ensuring broad industry adoption and interoperability.
Learn More
To explore SPDX 3.0 in detail, visit the following resources:
The easiest things to start with are the SPDX License List or by using the license identifiers in your code. Just start using them wherever it is appropriate or makes sense for you to do so. The pinnacle of SPDX is producing and/or consuming SPDX artifacts. Let’s look at each of these in turn.