The fabulous Redecentralize Conference was organised by Ira and a bunch of other volunteers. Its subject – how do we make the net resilient, private and fun again?
It was an unconference, so I decided to do a session on a personal itch I’ve had for the last few years – file synchronisation and backup.
I don’t have a nice way to manage my files – documents, music, photos, email. Syncing them across devices and keeping them backed up seems strangely harder than it did in the 1990s. At least, if you don’t just go all in and trust Google or Dropbox with them.
The idea of the workshop was to find out what decentralized software technical people actually use every day for syncing and backing up their files.
We wrote down all the tools on cards, and voted with sticky notes if we used them. I was hoping there was some amazing tool based on newish technology like Distributed Hash Tables that I could use. There wasn’t.
So here’s the tools geeks use, working up to the favourite at the end.
Syncthing (0 votes)
This is one of the few actually peer to peer tools which came up – it directly syncs files between devices. Lots of people mentioned it, which is why it got a card even with no votes. Alas, none of us use it day to day. It just isn’t mature enough. Every six months I try it again, hit an unsolvable problem and revert to BitTorrent Sync. They’re working hard to improve Syncthing, and I’m sure would welcome more help.
PhotoBackup (1 vote)
Looks like a nice solution to get files off your camera phone and onto your own server. It’s a pain that it needs a special server to post the files to – I’d like it more if I could configure it to just use SCP (see the SSH section below).
Carbon copy cloner (1 vote), Time Machine (2 votes)
A few people recommended proprietary Mac tools. They considered them decentralized, because they keep the backups on their own multi-terabyte hard drives at home, rather than putting them in one central cloud. No doubt they’re polished and easy to use – I still use CrashPlan on an old Mac for that reason! Alas, they’re no good if you have even one device with another operating system.
ownCloud (2 votes)
This self-hosted email/calendar/contacts/file sync tool is a powerful combination. Surprisingly few people use it. My theory is that it tries to do too much itself, and does none of it quite well enough. I found it too slow, clunky, and unconvincing to trust my most valuable data to. I’m optimistic that it’ll continue to improve.
duplicity (2 votes)
While researching backup for myself, I find price is a constant frustration. All the most open ones require a server with a live disk mounted on it. S3, an Amazon service that can just store blobs, is considerably cheaper. It’s even cheaper than Dropbox, which is built on top of S3 and adds a margin!
A couple of people at the workshop took advantage of this price while keeping privacy by using duplicity. It encrypts files and stores them in S3. I find that a bit of a cheat – some control is still lost, it isn’t resilient. As economies of cloud storage scale even more, I’m open to changing my mind!
Tarsnap (2 votes)
This technically excellent Unix backup tool also uses S3 behind the scenes. It is by all accounts easy to set up, fast and powerful. My one concern is that it is too dependent on its founder Colin Percival. You really want backup software to have a whole community maintaining it, or a strong company with incentives to keep the service running. Or it won’t be there when you really need it.
SSH (4 votes)
People using SSH probably overlap with the rsync people below, or they were using scp or bup or unison or something similar. All of these excellent command line sync and backup tools need a protocol to send files remotely, and that’s SSH. Stalwart, and to this day what geeks use to write data remotely from automated processes. It’s that basic writing functionality which I want some funky new DHT to improve.
rsnapshot (5 votes), rsync (1 vote)
It’s hard now to remember how revolutionary rsync was when it came out (apparently in 1996!). Rolling checksums make file copying fast the second time, enabling all sorts of new backup and sync possibilites. The raw technology idea was vital to Dropbox’s genesis. Nearly everyone at the workshop used rsync via rsnapshot, a layer which can backup whole machines and archive their history.
git (6 votes), git-annex (1 vote)
I use git with an auto-commit script to keep important text documents. One workshop member used git-annex, making even large files not a problem. It’s very geeky, but it’s very reliable – history and checksums and decentralization make it very hard to lose data. It does two way sync, but unlike Dropbox actually merges the files.
A key problem for me with this is that Android support is woeful. In the end, I fix that with a nasty hack – I use BitTorrent Sync to transfer a directory on my phone to a server, and have a script there which merges and commits.
Thinking about it, git is the biggest obvious opportunity which came from this workshop. Just as Dropbox was built on rsync technology, could something very user friendly be built on top of git, for the purpose of personal document syncing? Add a peer-to-peer transfer protocol, a beautiful GUI, three way merge of Word documents… The internals built by Linus would make it robust and reliable.
OS file browser + standard protocols (1 vote)
At the end I asked the room, “who is happy with their solution?”. Only one person put their hand up. She had tried every nerdy backup tool imaginable, and abandoned them all.
Instead, she keeps files on her computer and her relative’s computer. If she’s making a document she cares about, she transfers it by USB stick between the computers, to make sure there are multiple copies in different places. And that’s it.
Unsophisticated? Not at all. Way easier than backing up paper. Understandable by anyone. Full control. Cheap.
Conclusions
Thanks to everyone who came to the workshop. I was disappointed that there wasn’t a solution to my problem. However, I learnt several valuable things, including one to improve what I do in the short term.
I need my own (virtual) server (again) – there were several folk from Bytemark at Redecentralize. They made me realise that I had been trying for a while to not need my own server. Alas, the fully distributed solutions just aren’t good enough yet. You need something you can make web requests to and SSH to if you want to use OwnCloud, git, ssh or rsync. It’s a relief to accept this.
Simple is better, sometimes – the happiest person was doing manual file copying like you would have done before “the cloud” in the 1990s. It’s obvious to non geeks, don’t be afraid of doing this. Of course, you accept downsides, but you gain upsides of simplicity and peace of mind.
Someone will win with an inspiring UI on top of Git – merging of text files, robust history, fast and reliable. There are fundamental technical reasons why git is good for some of this stuff. Just as Dropbox used rsync technology and brought it to hundreds of millions, one day somebody will bring these benefits of git to the world.
We need low level distributed storage / naming – personal servers running Dropbox clones will keep us going for a while, but at the end of the day this worrying about servers, about named computers, is part of the problem. At some point, a distributed file storage will take off. IPFS is exciting, although it has no charging mechanic so can never be “fire and forget”. My best hope right now is MaidSafe. When they ship!
There is a market for a geeky command line sync/backup tool – despite all the open alternatives, nothing works well enough that it has dominated the market. I’m confident that technical people haven’t solved this problem for themselves, and want it solved. Powerful interfaces can then be built on that stable base.
I’d love to hear your thoughts and ideas in the comments below!
Try borgbackup.
@floppy says: “I’d have added BitTorrent sync to that conversation. Pretty easy. Nice write up :)” https://twitter.com/Floppy/status/663372403603542016
@roryoung says: “I’m surprised camlistore didn’t get a mention.” https://twitter.com/roryoung/status/663338326842298368
@dch__ says: “I’ve tried most of these & in last 6 months stuck to zfs recv/send (work) tarsnap (private stuff) google nearline (photos)” https://twitter.com/dch__/status/663102450556116993
ODI have made https://github.com/theodi/philbot/blob/master/README.md which is like Amazon Storage Gateway but for Rackspace Cloudfiles.
@zenalbatross says: “this is good info. I recently did a workshop on setting up p2p clouds using syncthing/btsync. Repo: https://github.com/lawfulintercept/PiCloud” https://twitter.com/zenalbatross/status/663382110082867200
I’ve since writing this found out that Filecoin is a sister protocol to IPFS. I think it might solve the payment issue for IPFS (that ultimately storage needs to be paid for, or a system won’t scale).
remoteStorage – An open protocol for per-user storage https://remotestorage.io/
As PhotoBackup’s author, I just want to add that it is planned for the Android client to handle more server options, like a SSH server.
Thanks to talk about my small project, there’s only one user, but I think I know who this is :-)
Irmin is a distributed database that follows the same design principles as Git
http://github.com/mirage/irmin
A couple of people suggest http://sparkleshare.org/