“If you really want to … truly ancient history, you have to go back to delta decks on punch cards.” (Jim Rootham)
In a world where biographies of cod are not just accepted, but rightly popular, it wouldn’t seem entirely crazy to write a history book on how computer programmers store the vital product of their labours – source code.
Since neither you nor I have time to read or write such a thing, we’re going to have to settle on this one blog post.
It’s an important subject.
The (for now) final end product seems incredibly obvious. And popular.
Yet it took decades of iterative innovation, from some of the cleverest minds in the field, to make something so apparently simple yet powerful.
And every step was astonishing.
1. Source code is text in a file! (1960s)
With hindsight, it’s obvious that source code is best stored as just writing in simple documents. A brief read of the history of ASCII gives a flavour for the complexity of agreeing even that.
2. Humans can manually keep track of versions of code! (1960s)
As everything, to begin with there was no software.
“At my first job, we had a Source Control department. When you had your code ready to go, you took your floppy disks to the nice ladies in Source Control, they would take your disks, duly update the library, and build the customer-ready product from the officially reposed source.” (Miles Duke)
3. You can keep lots of versions in one file! (1972, 1982)
Using a fancy interleaved weave file format, SCCS ruled the roost of version control for a decade.
It took some years to develop a good method for recording the changes from one version of a file to the next. “An Algorithm for Differential File Comparison” is a relatively late paper to read on the subject (1976).
In 1982, SCCS’s successor RCS (original paper describing it) used these diffs in reverse to beat SCCS, and astonished this commenter:
“Along came RCS with its reverse-deltas, and I thought it was the bee’s knees” (Anonymous)
4. You can each have your own copy checked out! (1982)
At the time, people tended to log into a central mainframe and work together via that. With RCS, using symbolic links, it could be arranged so that each person was working with the same version control, but their own working copy.
“there will be a file called
RCS
that is a symbolic link to the master RCS repository that you share with the rest of your group members” (Information on Using RCS at Yale)
5. Wow! You can version multiple files at once! (1986)
Amazingly, up until CVS, each version control system was for separate individual files. Yes, you can use RCS with wildcards to commit multiple files, or mark particular branches. But it isn’t really part of the system.
In CVS it was the default to modify all the files recursively. Software was suddenly a recursive tree of text files, rather than just a directory or an individual file.
It was badly implemented as it wasn’t “atomic” (successor Subversion fixed this in 2000), but really that doesn’t matter for the purpose of astonishment.
6. Two people can edit the same file at the same time, and it merges what they both did! (1986)
In the late 1990s I worked at Creature Labs. We were changing from Visual SourceSafe (commercial, made by Microsoft) to CVS (open source, made by a bunch of hippies).
There was frankly disbelief that it could do its main magical promise – let multiple people edit the same file at the same time, and be able to flawlessly merge their changes together without breaking anything.
The exclusive locking of SourceSafe was a real problem when we were making Creatures 3. I remember a particular occasion we were adding garbage collection which meant editing most code files, and the lead programmer had to check out every file exclusively over the weekend while he implemented it.
This paper from the 1986 is an excellent historical record of this magic, wherein Dick Grune suffers the same problem while his team code a compiler in Holland, and so invents CVS.
7. The shared repository can be on a remote machine! (1994)
Most of this time people were mainly using version control on one computer. Some versions of RCS, and hence CVS, had a remote file sharing mechanism to let you have a remote code repository in 1986.
“If a version of RCS is used that can access files on a remote machine, the repository and the users can all be on different machines” (Dick Grune)
But it looks like it was only in 1994 when a TCP/IP protocol added, that the idea really took off.
“[CVS] did not become really ubiquitous until after Jim Blandy and Karl Fogel (later two principals of the Subversion project) arranged the release of some patches developed at Cygnus Software by Jim Kingdon and others to make the CVS client software usable on the far end of a TCP/IP connection from the repository” (Eric Raymond)
8. Free open source version control hosting! (1999)
This isn’t an advance in source control technology, but it was astonishing, and on the Internet social advances can be as important as technical ones:
The tendency was for older OSS versions to be hard to find … John T. Hall had the insight that if projects were developed on the site, the old versions would be there by default. A development platform service was audacious, but no one else was doing it, and we thought “why not?” (Brian Biles)
Partying like there was no tomorrow (for their stock), VA Linux introduced SourceForge to the world. This was great for new projects (like my TortoiseCVS).
It was hard and expensive to get a server on the Internet back then, and it wasn’t easy or cheap to set up source control and a bug tracker. This new service, despite its lack of business model, fledged numerous projects that bit earlier.
9. You can distribute it all so there’s no central repository! (2005)
There was a wave of version control systems in the early noughties, making version control completely distributed.
That is, your local machine has an entire copy of the history of the code, and can easily branch and merge on a peer to peer basis with any other copy of it. By the way, the same feature makes it much easier to branch and merge in general.
Given that, it seems unfair that I’ve dated this astonishment 2005. That’s because I’m not recording the first time anyone made the astonishing thing, but the first time it was productised and became popular. April 2005 was when both Mercurial and Git were released.
The post “The Risks of Distributed Version Control” (late 2005) shows how radical this new-fangled stuff was seen to be.
10. When you checkout that’s a fork too, and you can do that in public! (2008)
The success of GitHub is for several reasons (that deserve a whole blog post, although I’ve alluded to one of them before).
In the context of this post, the astonishment was that you might want to make even your tiny hacks to other people’s code public. Before GitHub, we tended to keep those on our own computer.
Nowadays, it is so easy to make a fork, or even edit code directly in your browser, that potentially anyone can find even your least polished bug fixes immediately.
Coda
Have a quick look back up at those decades of progress. Yes, some of the advances were also enabled by increasing computer power. But mainly, they were simply made by people thinking of cleverer ways of collaborating.
It makes me wonder, what is next? What new astonishing thing will happen in version control?
More broadly, can the same thing happen in other fields?
Are core parts of our information infrastructure – that ultimately block innovation in government or healthcare or journalism or data, as capable of such dramatic improvement?
I have this feeling we’re going to find out.
Want more? Read “The version control timeline” (on Plastic SCM’s blog, don’t miss the comments) and “Understanding Version-Control Systems” (by Eric Raymond).
LoseThos has text and graphics in source code.
Was there ever a time when source was not just a text file? Even when it was stored on punched cards or tape the format was the same as existing textual data formats, which existed before electronic computers.
Yes. “Source Code” was partially in diagrams on paper. We built a hybrid analog-digital computer to solve differential equations. The analog “source code” was a wiring diagram for connecting the integrators.
Title quote attributed to Kent Beck. Let’s hope the next step is to loose the text fies, and version the entities directly. In image-based development such as Smalltalk, code is not written into flat text files. The entities are just there, in the image, and the version control systems reflect that simplicity.
“source code in text files? How quaint” -Kent Beck
Let’s hope the next step is to loose the text fies, and version the entities directly. In image-based development such as Smalltalk, code is not written into flat text files. The entities are just there, in the image, and the version control systems reflect that simplicity.
In the “Humans can manually keep track of versions of code! (1960s)” section, Miles Duke is quoted as talking about handing in floppies; I followed the link, and the statement he made doesn’t have a year associated with it.
Checking Wikipedia, floppies weren’t available for sale until 1971. I wouldn’t be surprised at all that people were doing “human-based” version control in the 60’s, but they probably were using either punched cards, or coding forms (where you’d hand-write your program onto a page with lines of eighty columns).
@alan wostenberg: I worked for many years on Lisp machines with images, and I’ve played with SmallTalk.
I really appreciate source code in text files. When you’ve worked with an image-based system, you learn that *EVERYTHING* you do goes into the system… test code, mistakes, one-line-patches, data… it’s a formless blob.
Maybe someday someone will figure out a way to make it coherent, but for now, files are the choice of most everyone who’s tried the blob approach.
There’s a company here in little old Adelaide that still uses SCCS. They make radios…
There are actually a number of historians looking at version/source control (although there still remains a dearth of study, given the importance of the issue). Michael Mahoney, Michael Cusumano, and N.L. Esmenger are three of the more important. For a contextualization of the history, see my short presentation (at UCLA): http://www.iqdupont.com/networked-modes-of-production/
@yachris You are absolutely correct. Floppies were not used in the ’60s. This history is quite well understood. Punch cards were manually managed, and edits were made with change cards on colour-coded cards. The “code stack” was literally a stack of punch cards
Also, in the late 60s a product by IBM called CLEAR-CASTER was used to perform deltas and simple code management. In 1972 SCCS was being developed by Rochkind in Bell Labs, and by 1975 it was used outside of Bell Labs, and at that point, the proliferation was steady.
A friend always wrote “Newest” on his newly copied source card decks (on the 12 edge).
An office mate was on the committee that worked on ASCII.
(9) ought to reflect on the shaky-but-indicative earlier efforts which paved the way for Git and Mercurial, particularly “Tom Lord’s Arch” (the Arch project, tla binary).
He was crazy enough to think it would work long before anyone else took it seriously. :-)
Now it’s time to bring the idea beyond code :)
color coded cards?
All the cards I remember using were the same color. Just punched new ones and stuffed them in the deck.
In the mid 60’s Univac Exec 8 had time sharing & we could write directly to files – using a text editor. Prior to that the program was a card deck we kept in metal card trays.
But I only wrote engineering stuff in those days.
The trick was to use columns 72-80 and a call to ‘sort’. Your “job control” had several steps in the deck. The first step deck was numbered by every 1000. Job steps 2..n had “a patch” deck. Then you call sort to “overlay” the patches. So “sorting” was the first version control I remember using.
Oh – I’d forgotten. The Exec 8 line editor supported versioning – as I recall it was maybe 5 or 10 versions each. They did it by assigning a ‘cycle number’ to each line of code, so it was possible to back out a few bad edits.
That’s on a per-file basis.
It wasn’t version control as such, but it’s close.
This a wonderful article and really took me back. I used RCS briefly in the mid-’90s, switched soon after to CVS, and it’s been an exciting ride ever since (Subversion, a brief dalliance with Darcs, and now Git). Thanks for giving this important and often counter-intuitive subject (some of) the attention it deserves. N.B. Except in their all lower-case logo, “GitHub” is usually rendered with a capitalized “Hub”.
Lots of comments on Hacker News: http://news.ycombinator.com/item?id=3364108
Michael – thanks, have fixed spelling of “Github” to “GitHub” in my post.
In the mid-90s, ParcPlace had a version control project that treated source code as a database, where the atomic unit was the method/function, not the file. It was one of the many things that got killed in the PPS/Digitalk merger.
Every time I see a code diff flip out because a method has moved, I regret that the rest of the world never caught up.
I agree, this could make a fascinating book. Even more interesting would be taking the history back further to version control’s origins in engineering change control. Long before there was source code there were engineering drawings and documents. I suspect that much of the software engineering process and terminology has its roots there.
You forgot the classic directory of zip backups – it was so common that there were automated bat scripts passed around to help.
It even lasted into the 90s with it being one of the first Ant targets people used when learning.
“…an image-based system… *EVERYTHING* you do goes into the system… test code, mistakes, one-line-patches, data… it’s a formless blob.”
Gack. That’s incredibly not the case with even the basic 30 year old Smalltalk “change set” mechanism. Not to mention the improvements in Smalltalk change control since then.
But, well, sigh.
My Symbolics Lisp machine used “world builds”. You saved out a new “world” containing the current state which would later be reloaded.
It also had a generation file system so every time you saved a file it bumped a counter. So you had files named a, a.1, a.2, a.3 … etc.
I actually wish we had editors that still did this.
My KROPS project was a self-modifying expert system. It kept “changes” as “memories” so you could ask it what it used to know but was now “subsumed”. It “learned” by self-modification. Since some of what it “knew” was encoded in the shape of the data structure rather than text it wasn’t possible to print it out. The data structures were circular, self-referencing objects. Gotta love Lisp :-)
This is a great collection of the history of the mainstream of revision control.
To preface it with “what took the brightest minds many years” is a little misleading though. What took a long time was for the mainstream to catch up.
One example is the PIE system from Xerox Parc (both ’81)
– “LAYERED NETWORKS AS A TOOL FOR SOFTWARE DEVELOPMENT”
Ira P. Goldstein and Daniel G. Bobrow
– “An Experimental Description-Based . Programming Environment: Four Reports”
Ira Goldstein and Daniel Bobrow
I also slightly disagree that #7 is astonishing. What was astonishing was the internet + personal computers. Pre-internet the problem of remote revision control didn’t exist because people simply had ALL their documents on a file share on the LAN.
Excellent chronology. Aside of its machine-readable forms, source code also appears in books and other media; often in the form of small code snippets.
Plastic SCM have done a cute history of version control systems illustrated with aeroplane drawings: http://www.plasticscm.com/version-control-history.html
I came in at step 3. I was shown the basics of SCCS at work and immediately used it to accidentally delete my previous three weeks of work.