Duplicity to back up one of my Linux servers for several years now. Duplicity supports quite a few network protocols for connecting to file servers, including commercial servers like Amazon S3, Google Drive, and Microsoft Azure. For my personal use, I’ve only used its ability to use SSH to back up to a small NAS. I have not been completely happy with Duplicity, but I have successfully used it to restore data lost from a hard drive failure so I’ve continued using it. Recently, I’ve discovered a number of issues, one of which I want to briefly discuss here, namely performance.
This was written in early December, 2017. it’s entirely possible that the problem discussed below has been fixed.
At some point in the last year, presumably after an
apt-get upgrade, Duplicity stopped performing backups at all.1 While fixing that issue a few days ago, I discovered that at some point, Duplicity’s performance had become abysmal.
My setup is that every day, I perform an incremental backup and once a month, I perform a full backup. The full backup involves compressing about 40 GB of data, GPG encrypting and signing, and transferring about 20 GB of data to my NAS over SSH.
Previously, this operation would take about four and a half hours. After upgrading Duplicity, it was taking more than 47 hours.
Apparently, a new, default SSH backend was introduced using Paramiko, replacing the old SSH backend which used Pexpect and the system
sftp binaries. The duplicity man page says that the advantages of Paramiko over Pexpect are “speed and maintainability.”
Some Googling suggests that Paramiko might not be so speedy after all, at least not with its default settings.
To compare the two SSH backends, I performed two full backups, first with the default Paramiko, and then with Pexpect. For some reason, Duplicity’s
--progress option is completely broken so I made do with a somewhat primitive approach: Duplicity stores encrypted “volumes” with a default size of 200 MB. By calculating the difference between the modified time of volume file n and the modified time of the first volume file, I can plot how long it took Duplicity to backup volumes 2 through n.2
As you can see, Paramiko performs substantially worse than Pexpect. Hopefully, the Duplicity developers will fix this pretty major performance regression. In the meantime, using Pexpect seems like a pretty good fix.
Andrew Jeffery suggests that I check out Borg Backup. From reading the documentation, it looks pretty good. I plan to try it out. With luck, I can ditch Duplicity altogether.
All of the code and data for this post is on GitHub.
My list of files to exclude also contained files to include. These were superfluous so the Duplicity developers apparently decided to make this an error and not perform backups. ↩
The time for volume 1 cannot be handled in this fashion because although I know the start time of the backup, Duplicity performs a bunch of work before it begins transferring files. Linux (at least the version running on my NAS) doesn’t expose file creation times so I cannot use that for the first volume either. ↩