LORD (Large Opaque Removable Device) backup =========================================== Introduction and fundamentals ----------------------------- With the availability of compact, fast and large capacity removable flash memory devices, it is often advantageous to store all permanent user data not in the user's home directory on a computer, but instead on a USB "memory stick" or "drive", using a device or partition resident encrypted file system with software such as TrueCrypt. Since the device is removable, it is much easier to safeguard than either a desktop or a laptop computer. In addition, a single device can be used on different computers at different times and at different locations without the need to synchronize the user's "home directory" content, possibly over an insecure channel. If the encryption program does not write any un-encrypted meta-data to the device, it is quite difficult (often bordering on impossible) for an adversary to demand, via legal or extra-legal coercion, that the data owner provides access to the information contained on it. This is quite unlike a laptop computer, where it has become quite common for adversaries with questionable administrative powers to demand that the laptop computer is powered up and logged-in, and presented in such state for content inspection. When such device is used not only for static data but also for volatile data that is created with a significant expenditure of work-hours or copied from sources no longer accessible, frequent and regular backup of the content of the device becomes mandatory. For obvious reasons, it will be advantageous to create and maintain a device backup while the device is in the encrypted state, so that either the interception of the backup procedure or access to the backup data storage is not a possible leak vector of the encrypted information. In other words, the device should be treated by the backup process not as a hidden file/folder structure, but instead as an "opaque", apparently random, "blob". This is easy to achieve if the device is relatively small, as each regular backup could simply consist of nothing but copying the complete device to a backup data file. If, however, the device is large, and at each regular backup only a very small part of it has changed and needs to be copied to replace the corresponding content of an already existing backup file, the simple solution of complete file copy would result in very long run times and very high band-with use, regardless of how much data really need to be refreshed in the backup. This is the practical problem this program addresses: how to perform an efficient backup of a Large Opaque Removable Device - typically a large-capacity USB flash memory "stick" with an encrypted file-system on it. The term "differential backup" is often used to describe a backup where only the changed parts of the data are copied from the source to the backup file(s), in order to save both the run time and the bandwidth. The problem of differential backups is addressed by the common Linux, OS X and MS Windows utility program called "rsync". However, rsync cannot read the raw removable device as an input file (as there are good technical reasons for this limitation). On the other hand, the problem addressed by this program is in some aspects much simpler than the general cases rsync is designed to handle: it only needs to deal with a single device/file, and it can be safely assumed that in the large majority of instances only a minuscule portion of the content of the device will need to be copied over to the backup file. Similarly to rsync, this program addresses the problem of differential backup by considering the device that needs to be backed-up as an ordered collection of fixed-size segments. With an appropriate mechanism the segments that have changed since the previous backup was performed are identified, and only those segments are copied from the source to the backup. In contrast to rsync, which always computes the hashes of both source and backup file segments, this program requires the presence of a permanent table of hashes of each segment in the backup file. This table is co-resident with the backup file. In order to perform the backup, segments of the source device are read and their hashes are computed. The hash of the corresponding segment in the backup file is retrieved from the hash table file and it is compared with the hash of the segment in the device. If the two hashes are different, the segment is copied from the device to the backup file, and the hash is updated in the backup segment hash table file. Another difference between this program and rsync is that while both programs assume the backup location could be on a remote, network-connected server, rsync assumes that the server has on it a permanently running rsync software component ("rsync daemon") but does not require that the remote server backup location be "mounted" on the local filesystem. When this program uses a remote server as the backup file location, it assumes the network directory will be mounted as a directory of the local file system ("mount point") but it does not require the remote server to have permanently running rsync daemon, something often not available on commercial cloud storage provider servers. Since the program expects the presence of the segment hash table file to be co-resident with the backup file, the initial instantiation of the backup cannot be performed by simply using the operating system file copy command, but instead by use of this program. It will not only perform the copy, exactly as the OS utility would, but it will, in addition, create the hash table file, co-resident with the backup file. It will also evaluate the hash of the complete device file. This value can be compared to what the OS command "md5sum" produces as the hash of the device (and backup) file, in order to confirm the correctness of the md5 hash computation as implemented in this program. Repercussions on deniability: as mentioned above, in a well-designed encrypted file-system application (such as is the case with the previously mentioned TrueCrypt, cf. https://en.wikipedia.org/wiki/TrueCrypt) will have the un-mounted "blob" in which the encrypted file-system is resident indistinguishable from a set of perfectly random collection of bytes. If the blob occupies the complete, un-partitioned USB device, it is entirely plausible to deny the USB stick is anything but a securely erased memory device. This plausibility is to a small degree reduced if the blob is partition-resident (as opposed to un-partitioned-device-resident) and it is completely subverted if a file (such as the backup file) with the exact same binary content as the device, is in the possession of the device owner. Program execution ----------------- The following command line examples assume a program running under Linux or OS X operating systems. MS Windows operation guide is provided under a different cover. The program is invoked with two command-line parameters, as shown in the following example: lord /dev/sdb1 ~/backups/my256usb.tc where the first argument (/dev/sdb1) is the device and partition where the USB flash memory stick is attached, and the second (~/backups/my256usb.tc) is the path/file name of the backup file. Depending on the value of the second argument, there are two different program execution "modes": an initialization run and a backup run. Initialization: If the ~/backups/my256usb.tc does not exist, the program is expected to create it as an exact copy of /dev/sdb1 partition, and to create the accompanying hash table file. The hash table file will be created in the same directory as the backup file, with the name of the backup file appended by a ".sht" suffix. (s-h-t for (s)egment(h)ash(t)able). Note that if the backup file does not exist, but the segment hash table file does, the latter will be quietly replaced by a new hash table content! Backup: If the ~/backups/my256usb.tc (and accompanying hash table, with the same name as the backup table appended by ".sht" (for (s)egment (h)ash (t)able - e.g., .../my256usb.tc) does exist, the program is expected to perform the backup. With the help of the segment hash table file, only those segments that are not identical on the device and on the bacup are copied over. In addition, the segment hash table file is updated so that all hashes in it are those of their corresponding segments in the backup file. Verification: After both the initialization run and after the back-up run the backup file must be an exact replica of the device. If desired, this could be verified by running the operating system md5sum command, as follows: md5sum /dev/sdb1 and md5sum ~/backups/my256usb.tc The output hash values of the two files must be the same. Note that the initialization run will evaluate and report the hash of the device, so that running the first of the above commands won't be required. Sudo: As in most Linux environments the regular (non-root) user is not allowed to read (or write) to raw devices, a program that reads /dev/sdb1 will probably need to be run as a "superuser": sudo lord /dev/sdb1 ~/backups/my256usb.tc this might require a subsequent step that will change the ownership of the backup file and segment hash table file back to the regular user, e.g., sudo chmod 400 includerjack:userjack ~/backups/my256usb.tc (if necessary, see man pages of sudo and chmod commands for details). It is assumed that in normal operation the encrypted blob will always be backed up while un-mounted. If this rule is not followed, it must be noted that the change of two segments of the mounted device while the backup is running, so that one (or some) segment(s) is included in backup and one (or some) is not, the backup run can result in a corrupt internal file-system state. Since the device itself is never written to by this program, the problem can always be resolved by erasing the backup file and re-initializing (i.e., re-creating the backup file and the hash table) while the device is not mounted. This will of course not be an option if the original device is lost. Depending on the encryption software used, the backup file itself can be mounted, and disk-checking utility can be used on the mounted file-system in order to verify the integrity of the backup file. Compiling and installing the program: ------------------------------------- The program has no external build dependencies and can be compiled using a simple gcc cmpiler command, for instance: gcc -o lord /home/userjack/sources/lord/lord.c if /home/userjack/sources/lord/ is the directory where the C language source code of this program resides. As there are no run-time dependencies either, there is no need for any "installation". M.O. assumptions ---------------- While there is no shortage of encrypted file-system programs and cloud storage service providers, this program assumes some rather specific, perhaps not universal operational requirements, such as: * The user might be required - or might prefer - to access the confidential information on a computer not (permanently) connected to the Internet. * The use of confidential information must leave minimal to no traces on the computer. * If remote servers are used for backup, the server owner or administrator must not be in a position to misappropriate user's data. * User can not prevent an adversary from sequestering his hardware; all he can do is to abstain from further use of any hardware that was, even for a brief period of time, under the control of his adversary. * All security-critical software must be inspected in source form by the user or by someone he or she can trust.