Container vs. File-by-file Encryption OR CrococryptFile vs. CrococryptMirror
This article is about the differences of the encryption schemes between a container and a file-by-file encryption solution and the resulting privacy implications. At the end, you will see that it is also a matter of taste and application context.
As an example for each type of encryption application, I will use our own CrococryptFile and CrococryptMirror. However, most of the observations are true for similar programs like TrueCrypt, Boxcryptor or simple ZIP encryption. With "similar" I simply mean that these programs - and many others - are also from the categories container and file-by-file encryption. I do not want to compare the actual applications!
Basics
In this article I am talking about encrypting files on a computer (or mobile device). I am using encryption especially when making backups of important files to an external hard drive or cloud storage. In case of the external hard drive, encryption is important to me, because mobile drives have a talent for getting lost or stolen. So I want to make sure nothing happens to my important data - meaning nobody else should be allowed to read it. In case of cloud storage, I am torn between a) not using it at all because of the security risks and b) using the possibilities of the high availability of cloud technologies to protect the existence of my personal data. Using strong cryptography is a compromise that I have agreed on with myself.
The difference between the two technologies (container and file-by-file encryption) has obvious differences: If you have multiple folders and files that you want to encrypt to a container file, you get a giant file dump as a result. In an ideal world, you would not be able to tell the content of this file dump. The disadvantage of this possibility is that you end up having a single big file to copy elsewhere and in case of modifications you might need to create the whole file dump again. Sometimes you might not realize this is happening. For instance, when you add of modify files in a ZIP archive, many ZIP utilities create a copy of the whole ZIP file but you are not able to recognize it because it happens in the background.
By the way - speaking of ZIP, if you use ZIP encryption to store backups in the cloud, be aware of the fact that ZIP is not encrypting any meta data of your files, just the content. That means cleartext filenames, timestamps and file sizes are readable for anyone accessing your encrypted ZIP archive. One simple trick is to compress your files to a plain ZIP archive and finally do a ZIP encryption on the plain archive. However, that might be tricky to accomplish using certain ZIP tools.
If you are using the possibility of file-by-file encryption, you have a gigantic advantage when it comes to storing your encrypted files in a cloud storage: The modification of a single file results in uploading its encrypted counterpart only. While Dropbox and TrueCrypt (containers) worked together in the same way - only modifications are transferred to the cloud, this is not generally possible with all container-based solutions and different cloud storage providers. Only in case it is specifically supported. File-by-file encryption always works that way - independently of the cloud storage.
However, file-by-file encryption obviously gives away some privacy: The number of files, file sizes, folder structures and sometimes timestamps (depending on the solution). This is especially important when we talk about storing your data in the cloud. As I said in the beginning, it might be a matter of taste. It even might be a matter of your position in the conspiracy theories in the context of intelligence services. But it also has some real privacy implications.
An Example
Let's assume you use a file-by-file encryption solution to make a full backup of your Windows folder. I am using Windows as an example here, because nearly everybody can reproduce what I want to explain. It might be questionable to backup your Windows folder into a cloud storage.
This is a screenshot from Windows Explorer showing the SysWOW64 folder on a machine that is really being used (meaning there is a lot of software installed etc.). The folder has 2505 elements (folders and files):
This is a screenshot from Windows Explorer showing the SysWOW64 folder on a nearly fresh machine. The folder has 2285 elements (folders and files):
Both machines run Windows 7 64bit. As you might see, they run on different patch levels for some reason. If we sort the view by file size starting with the biggest file, it looks like this:
So comparing both installations shows that there are many identical files. Only looking at the biggest files, this is the comparing result. The files marked yellow are not equal, they exist on both sides but have different sizes:
Let's do the same with the encrypted versions using CrococryptMirror. The encrypted version of the SysWOW64 folder on the first machine sorted by file size:
The encrypted version of the SysWOW64 folder on the second machine (fresh install) sorted by file size:
If you compare the two lists with the unencrypted versions, you see the effect of the compression being used. You cannot even say for sure if the biggest unencrypted file is the biggest encrypted file. Although I assume this is true in this case.
If you only compare file sizes now (nothing else is possible because CrococryptMirror encrypts filenames plus timestamps by default), you get the following equal files - marked red - on each encrypted folder on the different machines. Again, only regarding the top biggest files. Each file's content is of course different, because I used different key files on both machines. Nevertheless, files that have the same size on both machines have precisely the same size in bytes.
Implication
So what is the implication of that? Let's assume, we know the type of (encrypted) folder that we are seeing because of an educated guess based on the characteristics like structure and file sizes. In this context we can say: The bigger files are, the more likely it is that two different files that a) are compressed and encrypted with the same method and b) have the same size are also equal in plaintext. Using the example of the SysWOW64 folder, we are able to observe:
- The folders on both machines have between 2200 and 2600 elements
- The folders on both machines contain dozens of equal files that are bigger than 5MB
Hence, we are able to make an educated guess that we see an encrypted SysWOW64 folder, based solely on our observations. This even allows us to guess some of the plaintext files even though the files and their filenames are encrypted.
Of course, you can repeat this for all subfolders within the Windows folder - building up a database of heuristics. You can also include different categories of machines, Windows versions and file versions etc. It might not always be accurate but it is a matter of probabilities anyway. The more you (can) compare, the more accurate becomes the guess. You could guess complete folder structures or single folders/files only.
Analytic Continuation
Let's make assumptions on a big scale for a second. You could perform the considerations like above for any software (binaries, any publicly known files, folder structures) and store these heuristics in a database. With a determination program using these heuristics, you would be able to restore certain encrypted files without breaking the actual encryption. It would be a simple matter of statistical analysis.
If you are not able to restore certain data, it still might be possible to make an educated guess on the content of the encrypted files you are seeing. For instance, the DCIM folder (where photos and videos are stored) of a specific smartphone type using specific image settings would have a certain file & folder structure. This might also be detectable using a heuristics database. Again, you would not be able to crack the file encryption. Somebody would only know which type of data s/he is seeing not the actual data. In some circumstances, knowing the plaintext file type might be helpful for cracking attempts because you know what you are searching for.
Countermeasures
It might look obvious to say that there are countermeasures against these kinds of structural analysis. You could easily argue to include some random bytes or even megabytes of random length per file. Since these files are encrypted, it would not be possible to tell that there is random data included. The decryption could cut the random data from the file. Still, it would be possible to count files and look at folder structures for analysis. To counter this as well you could add (new) random files and folders.
I do not know a program that does this at the moment. I think, you would still be able to make good guesses on the basis of the algorithm that adds random data and files. Moreover, when we talk about cloud backups, such a solution would increase storage space - maybe in a severe way.
Comparison
Let's look at the second possibility: encrypting the SysWOW64 folder as a single giant file dump using CrococryptFile. The first machine's dump would look like this (of course you would not use the filename "SysWOW64", this is just for reasons of readability):
This is the file dump on the fresh installed Windows 7 machine:
Disregarding the filename, both files are incomparable. Not only do they differ in size, the content is not to guess - it could be anything inside this file. It could be a single MP4 movie, 100s of photos or 1000s of Word documents.
Conclusion
As I said in the beginning, the whole topic might be a matter of taste or of your position in the current conspiracy theories regarding the abilities of intelligence services. It is up to you, what you find significantly important for your personal use case and privacy.
Categories: IT Security Background articles
Comments
Post your comment
Share
If you like this page, it would be a great thing if you share it with others: