Martin Pool's blog

Some thoughts on arch security

GNU Arch has some pretty powerful and novel security properties for a version control system.

I have been helping maintain cvs.samba.org for a few years, so perhaps I have a pretty good idea, at least from the perspective of people doing open source or distributed development.

The word "security" means different things to different people. Some organizations, for example, would like to make sure that unauthorized persons don't see the source code, or even that developers who are allowed to see one part can't see another part. Others might want to make sure that any changes which are committed pass all the appropriate reviews and quality checks. I think Arch could probably do a pretty good job in helping with that, but they're not really the facet of security that I want to write about tonight.

What I am concerned with is that in recent years there have been quite a few criminal intrusions into development systems. Somebody tried to get a change into the Linux kernel source through the bitkeeper-cvs gateway. Somebody had a trojan installed on the machine of a senior developer at Valve software. Someone else got part of the Windows source code through compromising a developer's machine. Even if the source is not confidential the risk of unauthorized changes can be enormously disruptive.

CVS and Subversion are both commonly operated by free software projects in this mode: developers have SSH access to the server, and everyone else has anonymous read access.

It was originally planned that Subversion would run as an Apache 2 module using SSL and Apache authentication, so that there would be no need for developers to have local accounts. For various reasons this turned out to be pretty unpopular, and my impression is that almost all free software projects are using svn+ssh.

CVS, Subversion and co require a special server process both for committers and anonymous users. Arch does not: archives can be published just as read-only directories on a web or ftp server. This is a good thing: one less program to worry about, one less listening port. You can use whatever web server you think is least likely to be compromised.

Using SSH as a transport is one of my favourite Unix design patterns, and it is certainly much better than each SCM system inventing its own authentication protocol. But it does have several problems. Firstly, you need to be able to create Unix accounts for contributors. This has been a method of entry for attackers on open source projects before. Administrators can try to limit which commands can run, but there is a risk that contributors might break out of such a jail. On an older project, many dormant contributors may still have shell accounts, and these remain a possibility of intrusion.

Arch doesn't require that any two developers have access to a single system. This isn't just a theoretical possibility; it is the standard way of using it.

One good consequence is that there doesn't need to be any assessment of whether a contributor is "good enough" or "trusted enough" to have commit access. For CVS this is a big deal: someone who has commit access could destroy the whole repository or rewrite history, but people without commit access can't really work well. With Arch, there is no such decision point: anyone can work comfortably without needing to be specially trusted, and every change can be considered on its merits.

Most version control systems, including Arch, present the user model that once revisions are committed, they cannot be changed. The archive is conceptually read-only. However, as far as I know, only Arch makes the revisions physically read-only: each one is a directory containing a few files, such as this one. There is little chance of a later update changing or corrupting any previous work. (I have seen svn need to have its database rebuild from time to time, but arch never has.) This seems to have several really good properties against either accidental or intentional damage. On a machine that is shared by several developers, one might have a cron script that chowned and chmodded committed revisions as extra protection. Tools like tripwire would immediately pick up any new additions or changes (although changeset signing would probably trap that already.)

Arch stores checksums for each commit, which should trap accidental or hardware damage. These can then be gpg-signed, which should give pretty good assurance that, at least, the changeset came from a developer's machine.

If the worst happens and a machine is compromised, Arch's distributed design makes it likely that the affected or destroyed archive will be widely mirrored, so the changes can be detected and an older version can be restored.

[draft, more to come]

Archives 2008: Apr Feb 2007: Jul May Feb Jan 2006: Dec Nov Oct Sep Aug Jul Jun Jan 2005: Sep Aug Jul Jun May Apr Mar Feb Jan 2004: Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan 2003: Dec Nov Oct Sep Aug Jul Jun May