More Open Source Risk

I’ve been doing open source since forever.  Since before many of you whippersnappers were born.  For many of us old hands, software is mostly something you share and reuse.  FLOSS is the default mode.

But I’m also a software engineer, so I worry a lot about software quality and software process.

Which means that I worry a lot about open source software.

And these days, like a lot of people, I worry a lot about the quality and security of open source software. 

This summer Matthew L. Levy discussed Cybersecurity Risks Unique to Open Source [1]. 

As Levy notes, “Consumers of open source consider it to be more secure than hybrid or closed source software.” ([1], p.78)  I think this faith is based on the transparency of the process and products, and the breadth of contributors.  Better the devils fallible Carbon-based units we know, than the demonic forces mysterious developers and opaque processes behind closed doors; is the thinking.

Well, maybe.  Obviously, having source code available is better than not, but being open in itself, doesn’t make it better code or more secure code.

Levy is here to point out some of the risks of open source.  The risks that come from being open source.

The first thing is the shear complexity and size of software.  There is no such thing as a small bit of software.  It’s all dependent on everything else in many ways.  Puny Carbon-based units have little chance of understanding their software; what it does or where it came from. 

These days, this is called a “supply chain” problem, though it is really a cognitive psychology problem.  Levy gives an example of a widely used module that has dozens of code dependencies (i.e., to build it) and hundreds of thousands of run time dependencies (i.e., that it can call as it executes).  Following those dependencies, there are millions of “second degree” dependencies (i.e., things that the dependent modules may call).

Phew!

No mortal human could ever build something like this!  But, fear not. My generation of programmers worked really hard to create tools and infrastructure (including the whole damn Internet!) to manage all this.  With a single key stroke, any fool can download, configure, build, test, and install millions of lines of code.

You’re welcome!

The point being:  open or not, puny upright apes can’t possibly understand all this. So lot’s of stuff can go wrong, accidentally or maliciously.

One of the theoretical assets of open source software is that “everyone” can contribute, which is a lot more “eye balls” than any company or organization can call on.  However, in practice the “eye balls” aren’t evenly distributed, and, in any case, there still aren’t nearly enough of them. 

Most open source software projects have no contributors at all (i.e., they are uploaded and then just sit there), and many have only a handful.  Which means these packages are less well maintained than good quality commercial software, despite being free and open. And, sitting out there on the Internet, they are waiting, unprotected, for malicious actors to mess with them.

Levy also discussed “metadata risk”, which refers to the minimal protection on open source metadata, including process records.  For example, it is possible to fake records of git uploads and checkins.  This potentially obscures the source and history of code, and also pollutes the records used to establish trust in the software.

This last point cuts deep to the heart of the whole enterprise.  All of these vulnerabilities threaten the trust among the community of developers and those who use the software.  Open source works on trust.  Period.  Without trust, there is no open source software.

But, I would say that all of these vulnerabilities mainly come from the open and trusting nature of the open source process itself.  The reason you can’t trust open source software is because people do trust open source software. Or something like that.

What can be done?  Can open source remain viable?

Levy’s title promised to say “What Communities Are Doing to Reduce Them”, but there is very little here.   Obviously, increased attention to authentication of contributors is going to be necessary.  And there is going to have be a lot less freedom to muck around with metadata. 

Levy sketches some academic work that is beefing up data collection and analysis that monitors and documents the behavior of open source communities.  This sounds interesting (especially when I have my “anthropologist” hat on), though it isn’t clear what this will amount to. 

My own view is that the solution to infrastructure problems is—wait for it—funding.  If you want better software, pay for it.  Fer goodness sake!  How can you expect to build software using hobbyists and “some guys on the Internet”?


  1. Matthew L. Levy, Cybersecurity Risks Unique to Open Source and What Communities Are Doing to Reduce Them. Computer, 56 (6):78-83,  2023. https://ieeexplore.ieee.org/document/10132056

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.