Getting started with SPDX is easier than you think. There are three different ways in which you can engage with SPDX. They are mutually exclusive, meaning you do not have to do one to do another. That said, each one is valuable in its own way for helping with license identification and ultimately compliance.
In addition to the resources here you are encouraged to attend the General Monthly Meeting. Frequently we have a guest speaker from business or the community who presents on their use of SPDX. It’s a great way to see what others are doing and to share or ask questions.
The easiest things to start with are the SPDX License List or by using the license identifiers in your code. Just start using them wherever it is appropriate or makes sense for you to do so. The pinnacle of SPDX is producing and/or consuming SPDX Documents. Let’s look at each of these in turn.
Producing/Consuming SPDX Documents and SBOMs
While the SPDX License List and license identifiers in source allow more precision when conveying a license, it is the SPDX document which provides the most detail on a per package/file basis. This introduction will instead give you an idea of what an SPDX document is and what it can be used for, along with key points and links to additional information to get you started. This article will not go into detail about generating or consuming SPDX documents. Instead, it will direct you to relevant articles for more detail.
SPDX documents describe the licensing associated with a set of file or files. These files can be organized into what we call a “Package”. A package is merely a grouping of files, with some association to each other as defined by the creator of the document. In general, the association should be obvious, such as an SPDX document for a software library or application. SPDX Documents can use one of five file formats, tag/value (.spdx), JSON(.spdx.json), YAML(.spdx.yml), RDF/xml(spdx.rdf) and spreadsheets (.xls).
You can use any of these formats and there are SPDX tools to convert one format to another. The format you use will be based on your own factors, preference, tools, and use.
Packages and Relationships
While the single package concept of files worked well, the notion of relationships was added beginning with SPDX 2.0. This allows SPDX documents to address more complex use cases by being able to refer to one another along with what the relationship is between them.
As an example, consider a binary-only delivery or download as shown in the following figure.
In this particular example, the binary SPDX document has two relationships:
- That it was “generated from” these source files; and
- It dynamically links (say at runtime) with this particular library.
This now gives a complete licensing picture as you know the licenses of the sources used to build the application and then what it links with at runtime as well.
Let’s look at another example where using relationships describes the licensing of all the items with an application and how they “fit” together.
In this example, the application has some test software and documentation that comes with it. The application SPDX points to each of these to show that they are all “related” to this application and came with it. In turn, each of the other SPDX documents (binary, source, test, and documents) describes how they fit it in and what the licensing of that specific piece is.
Many real-world use cases were analyzed and captured when adding the notion of relationships into the 2.0 specification.
Contents of a Document
SPDX Documents are composed of one or more sections. Some of these sections are required, while others are optional. If you examine the figure to the right, you will see the following sections as of the 2.1 Specification.
- Document Creation Information – Denotes who created the document, how it was created and other useful information related to its creation.
- Package Information – This section provides information about the “package”. A package can be one or more files. These files could be one or more files of any type including but not limited to source, documents, binaries, and so forth. The package information contains the originator, where it was sourced from, a download URL, a checksum and so forth. it also contains summary licensing for the package.
- File Information – This is information about a specific file. It can contain the file copyrights found in the file (if any), the license of the file, a checksum for the file, file contributors and so forth.
- Snippet Information – Snippet information can be used to define licensing for ranges within files.
- Other Licensing Information – Other licensing information provides a way to describe licenses that are not on the SPDX License List. You can create a local (to the SPDX document) identifier for the license and place the license text itself in the document as a well and then reference it for files just like you would a license from the license list.
- Relationships – Relationships were introduced in the 2.0 specification and are a very powerful way of expressing how SPDX documents relate to one another. See explanation and example above.
- Annotations – Annotations are comments made by people on various entities and elements within the document. For example, someone reviewing the document may make an annotation about a file and its license. Annotations are useful for reviews of SPDX documents and for conveying specific information about the package, file, creation, license, file(s), etc.
Where to use it
Automated generation and scanning of SPDX documents for large open source code bases can be quite powerful. SPDX documents can be:
- used both internally and externally for an organization, group or community.
- can be a part of your internal compliance program providing detailed license information
- can be generated for files you make available to others
- can be used to describe the licensing of files your receive.
The following links are meant to provide further information to references and resources when working with SPDX Documents:
The SPDX License List is a list of commonly found licenses and exceptions used for open source and other collaborative software. The purpose of the SPDX License List is to enable easy and efficient identification of such licenses and exceptions in an SPDX document or elsewhere
Some relevant points about the list:
- More than 550 Licenses and exceptions as of September 2023
- Available on SPDX website – URLs won’t change
- Simple expression language for expressing conjunctive and disjunctive licensing
- Short license identifiers for easy reference
- Exact text of licenses
- License Matching Guidelines – for matching licenses against those included on the SPDX License List
- License Templates denote license text which is optional or replaceable per the license matching guidelines
Where to use it
There are many possible uses of the SPDX License List and its collateral. Here are just a few key ones:
- Use the standardized short identifier anywhere you would display or exchange the identity of an open source license. You can even link back to the SPDX License List for the text of the license. The license list links are immutable and the links will not change, ever. Note: Only use the identifier if the license text you are matching agrees with the text in the SPDX License List per the list matching guidelines. The power of the short identifier is that when you see it, you know exactly what the license text is!
- Use the License List itself for internal reference or processes.
- Use the matching guidelines and templates provided by the SPDX License List to help determine if the license text you see is the license.
Familiarity with the SPDX License List and where and how you will use it.
The following list is not meant to be exhaustive but to rather give you an idea of what some people and organizations are doing:
The following links are meant to provide further information to references and resources you may need when working with the license list.
SPDX License Identifiers in Source
The need to identify the license for open source software is critical for both reporting purposes and license compliance.
However, determining the license can be difficult due to a lack of information or ambiguous information. Even when licensing information is present, a lack of consistent notation for providing license information can make automating the task of license detection very difficult, thus requiring vast amounts of human effort. The SPDX Work-group proposes to use SPDX license identifiers to indicate the license at the file level. The advantages of doing this are numerous:
- It is precise; there is no ambiguity due to variations in license header text
- It is language neutral
- It is easy to machine process
- It is concise
- The license travels with the file (as sometimes not entire projects are used or license files are removed)
- It is simple and can be used without much cost in interpreted environments like java Script, etc.
- An SPDX license identifier is immutable.
- It provides simple guidance for developers who want to make sure the license for their code is respected
For more details, see https://spdx.dev/ids