LORD (Large Opaque Removable Device) backup
===========================================

Introduction and fundamentals
-----------------------------

With the availability of compact, fast and large capacity removable flash
memory devices, it is often advantageous to store all permanent user data
not in the user's home directory on a computer, but instead on a USB "memory
stick" or "drive", using a device or partition resident encrypted file system
with software such as TrueCrypt. Since the device is removable, it is much
easier to safeguard than either a desktop or a laptop computer. In addition,
a single device can be used on different computers at different times and at
different locations without the need to synchronize the user's "home directory"
content, possibly over an insecure channel.

If the encryption program does not write any un-encrypted meta-data to the
device, it is quite difficult (often bordering on impossible) for an adversary
to demand, via legal or extra-legal coercion, that the data owner provides
access to the information contained on it. This is quite unlike a laptop
computer, where it has become quite common for adversaries with questionable
administrative powers to demand that the laptop computer is powered up and
logged-in, and presented in such state for content inspection.

When such device is used not only for static data but also for volatile data
that is created with a significant expenditure of work-hours or copied from
sources no longer accessible, frequent and regular backup of the content of
the device becomes mandatory.

For obvious reasons, it will be advantageous to create and maintain a device
backup while the device is in the encrypted state, so that either the
interception of the backup procedure or access to the backup data storage
is not a possible leak vector of the encrypted information. In other words,
the device should be treated by the backup process not as a hidden file/folder
structure, but instead as an "opaque", apparently random, "blob". This is easy
to achieve if the device is relatively small, as each regular backup could
simply consist of nothing but copying the complete device to a backup data file.

If, however, the device is large, and at each regular backup only a very small
part of it has changed and needs to be copied to replace the corresponding
content of an already existing backup file, the simple solution of complete
file copy would result in very long run times and very high band-with use,
regardless of how much data really need to be refreshed in the backup. This
is the practical problem this program addresses: how to perform an efficient
backup of a Large Opaque Removable Device - typically a large-capacity
USB flash memory "stick" with an encrypted file-system on it. The term
"differential backup" is often used to describe a backup where only the
changed parts of the data are copied from the source to the backup file(s),
in order to save both the run time and the bandwidth.

The problem of differential backups is addressed by the common Linux, OS X
and MS Windows utility program called "rsync". However, rsync cannot read
the raw removable device as an input file (as there are good technical reasons
for this limitation). On the other hand, the problem addressed by this program
is in some aspects much simpler than the general cases rsync is designed to
handle: it only needs to deal with a single device/file, and it can be safely
assumed that in the large majority of instances only a minuscule portion of
the content of the device will need to be copied over to the backup file.

Similarly to rsync, this program addresses the problem of differential backup
by considering the device that needs to be backed-up as an ordered collection
of fixed-size segments. With an appropriate mechanism the segments that have
changed since the previous backup was performed are identified, and only
those segments are copied from the source to the backup.

In contrast to rsync, which always computes the hashes of both source and
backup file segments, this program requires the presence of a permanent table
of hashes of each segment in the backup file. This table is co-resident with
the backup file. In order to perform the backup, segments of the source device
are read and their hashes are computed. The hash of the corresponding segment
in the backup file is retrieved from the hash table file and it is compared
with the hash of the segment in the device. If the two hashes are different,
the segment is copied from the device to the backup file, and the hash is
updated in the backup segment hash table file.

Another difference between this program and rsync is that while both programs
assume the backup location could be on a remote, network-connected server,
rsync assumes that the server has on it a permanently running rsync software
component ("rsync daemon") but does not require that the remote server backup
location be "mounted" on the local filesystem. When this program uses a
remote server as the backup file location, it assumes the network directory
will be mounted as a directory of the local file system ("mount point") but
it does not require the remote server to have permanently running rsync daemon,
something often not available on commercial cloud storage provider servers.

Since the program expects the presence of the segment hash table file to be
co-resident with the backup file, the initial instantiation of the backup
cannot be performed by simply using the operating system file copy command,
but instead by use of this program. It will not only perform the copy,
exactly as the OS utility would, but it will, in addition, create the hash
table file, co-resident with the backup file. It will also evaluate the hash
of the complete device file. This value can be compared to what the OS command
"md5sum" produces as the hash of the device (and backup) file, in order to
confirm the correctness of the md5 hash computation as implemented in this
program.

Repercussions on deniability: as mentioned above, in a well-designed
encrypted file-system application (such as is the case with the previously
mentioned TrueCrypt, cf. https://en.wikipedia.org/wiki/TrueCrypt) will
have the un-mounted "blob" in which the encrypted file-system is resident
indistinguishable from a set of perfectly random collection of bytes.
If the blob occupies the complete, un-partitioned USB device, it is
entirely plausible to deny the USB stick is anything but a securely erased
memory device. This plausibility is to a small degree reduced if the blob is
partition-resident (as opposed to un-partitioned-device-resident) and it
is completely subverted if a file (such as the backup file) with the exact
same binary content as the device, is in the possession of the device owner.

Program execution
-----------------

The following command line examples assume a program running under Linux
or OS X operating systems. MS Windows operation guide is provided under a
different cover.

The program is invoked with two command-line parameters, as shown in the
following example:

lord /dev/sdb1 ~/backups/my256usb.tc

where the first argument (/dev/sdb1) is the device and partition where the
USB flash memory stick is attached, and the second (~/backups/my256usb.tc)
is the path/file name of the backup file.

Depending on the value of the second argument, there are two different
program execution "modes": an initialization run and a backup run.

Initialization: If the ~/backups/my256usb.tc does not exist, the program is
expected to create it as an exact copy of /dev/sdb1 partition, and to create
the accompanying hash table file. The hash table file will be created in the
same directory as the backup file, with the name of the backup file appended
by a ".sht" suffix. (s-h-t for (s)egment(h)ash(t)able). Note that if the
backup file does not exist, but the segment hash table file does, the latter
will be quietly replaced by a new hash table content!

Backup: If the ~/backups/my256usb.tc (and accompanying hash table, with
the same name as the backup table appended by ".sht" (for (s)egment (h)ash
(t)able - e.g., .../my256usb.tc) does exist, the program is expected to
perform the backup. With the help of the segment hash table file, only those
segments that are not identical on the device and on the bacup are copied over.
In addition, the segment hash table file is updated so that all hashes in it
are those of their corresponding segments in the backup file.

Verification: After both the initialization run and after the back-up run
the backup file must be an exact replica of the device. If desired, this
could be verified by running the operating system md5sum command, as follows:

md5sum /dev/sdb1

and

md5sum ~/backups/my256usb.tc

The output hash values of the two files must be the same. Note that the
initialization run will evaluate and report the hash of the device, so that
running the first of the above commands won't be required.

Sudo: As in most Linux environments the regular (non-root) user is not
allowed to read (or write) to raw devices, a program that reads /dev/sdb1
will probably need to be run as a "superuser":

sudo lord /dev/sdb1 ~/backups/my256usb.tc

this might require a subsequent step that will change the ownership of
the backup file and segment hash table file back to the regular user, e.g.,

sudo chmod 400 includerjack:userjack  ~/backups/my256usb.tc

(if necessary, see man pages of sudo and chmod commands for details).

It is assumed that in normal operation the encrypted blob will always be
backed up while un-mounted. If this rule is not followed, it must be noted
that the change of two segments of the mounted device while the backup is
running, so that one (or some) segment(s) is included in backup and one
(or some) is not, the backup run can result in a corrupt internal file-system
state. Since the device itself is never written to by this program, the
problem can always be resolved by erasing the backup file and re-initializing
(i.e., re-creating the backup file and the hash table) while the device is
not mounted. This will of course not be an option if the original device is
lost. Depending on the encryption software used, the backup file itself can
be mounted, and disk-checking utility can be used on the mounted file-system
in order to verify the integrity of the backup file.

Compiling and installing the program:
-------------------------------------
The program has no external build dependencies and can be compiled using a
simple gcc cmpiler command, for instance:

gcc -o lord /home/userjack/sources/lord/lord.c

if /home/userjack/sources/lord/ is the directory where the C language source
code of this program resides. As there are no run-time dependencies either,
there is no need for any "installation".

M.O. assumptions
----------------
While there is no shortage of encrypted file-system programs and cloud
storage service providers, this program assumes some rather specific, perhaps
not universal operational requirements, such as:

* The user might be required - or might prefer - to access the confidential
  information on a computer not (permanently) connected to the Internet.

* The use of confidential information must leave minimal to no traces on
  the computer.

* If remote servers are used for backup, the server owner or administrator
  must not be in a position to misappropriate user's data.

* User can not prevent an adversary from sequestering his hardware; all he can
  do is to abstain from further use of any hardware that was, even for a brief
  period of time, under the control of his adversary.

* All security-critical software must be inspected in source form by the user
  or by someone he or she can trust.