Modern crime often leaves an electronic trail. Finding and preserving that evidence requires careful methods as well as technical skill
Finding Lost Files
Many forensic investigations start with the examiner looking for files belonging to the computer’s previous user.
Allocated files are ones that can be viewed through the file system and whose contents under normal circumstances will not be inadvertently overwritten by the operating system. The word allocated refers to the disk sectors in which the file’s content is stored, which are dedicated to this particular file and cannot be assigned to others. Many digital forensics tools allow the examiner to see allocated files present in a disk image without having to use the computer’s native operating system, which maintains forensic integrity of the evidence.
One of the major technical digital forensics innovations of the past 15 years has been approaches for recovering a file after it is deleted. These files are not simply in a computer’s “trash can” or “recycle bin,” but have been removed by emptying the trash. File names can be hidden, and the storage associated with the files is deallocated. But a file’s contents sometimes can remain on the hard drive, in memory, or on external media, even though the metadata that could be used to locate it are lost. Recovering these kinds of data requires a technique called file carving, invented around 1999 by independent security researcher Dan Farmer, and now widely used.
The first file carvers took advantage of the fact that many file types contain characteristic sequences of bytes at the beginning and end of each file. Such sequences are called file headers and footers. The file carver scans the disk image for these headers and footers. When ones are found, the two sequences of bytes and all of the data between them are saved in a new file.
Modern carvers can validate the data that they are carving (for example, to make sure that the bytes between the JPEG header and footer can be actually displayed as a digital photograph) and can even reassemble files that are broken into multiple pieces. Such fragment recovery carving is computationally challenging because the number of ways that fragments can be realigned; the result is a combinatorial explosion as the size of the media increases. Missing fragments further complicate the problem.
Closely related to file carving is the problem of reconstructing compressed data. Compression is a technique that is widely used on computer systems to squeeze data so that it takes up less space. The technique exploits redundancy; for example, if asked to compress the character sequence “humble humbleness,” a computer might replace the six characters of the second instance of “humble” with a pointer to the first occurrence. English text typically compresses to one-sixth its original size.
Text must be compressed with lossless algorithms, programs that faithfully restore the original text when the data are decompressed. However, photographs and videos are typically compressed with lossy systems that exploit deficiencies in the human perceptual system. For example, a few dozen pixels of slightly different colors might be replaced by a single rectangle of uniform hue. The resulting savings can be immense. Without compression an hour of full-screen video might require 99 gigabytes but with compression the same video might take up only 500 megabytes—roughly 1/200th the original size.
The primary challenge posed by compression is recovering data when the compressed file is corrupted or partially missing. Just five years ago such corruption frequently made it impossible to recover anything of use, but lately there have been dramatic advances in this area.
In 2009 Husrev Sencar of TOBB University of Economics and Technology in Turkey and Nasir Memon of the Polytechnic Institute of New York University developed an approach that can show a fragment of a JPEG digital photograph even if the beginning and end of the file are missing. In 2011 Ralf Brown of Carnegie Mellon University developed an approach for recovering data from fragments of files compressed with the common ZIP or DEFLATE algorithms, even when critical information needed for reassembly is missing. Brown’s approach creates a model of the many different ways that a document might be decompressed based on the underlying mathematics of compression, and then chooses between the different possible documents based on a second model of the human language in which the document is written (see figure at right).
Recovering files in temporary computer memory can also be illuminating for digital evidence. The RAM of a desktop, laptop, or cell phone is a mosaic of 4,096-byte blocks that variously contain running program code, remnants of programs that recently ran and have closed, portions of the operating system, fragments of what was sent and received over the network, pieces of windows displayed on the screen, the copy-and-paste buffer, and other kinds of information. Memory changes rapidly—typical memory systems support several billion changes per second—so it is nearly impossible to make a copy that is internally consistent without halting the machine. An added complication is that the very specific manner by which programs store information in memory is rarely documented and changes between one version of a program and another. As a result, each version may need to be painstakingly reverse-engineered by computer forensics researchers. Thus, memory analysis is time consuming, very difficult, and necessarily incomplete.
Despite these challenges, recent years have seen the development of techniques for acquiring and analyzing the contents of a running computer system, a process called memory parsing. Today there are open-source and proprietary tools that can report the system time when a memory dump was captured, display a list of running processes, and even show the contents of the computer’s clipboard and screen. Such tools are widely used for reverse-engineering malware, such as computer viruses and worms, as well as understanding an attacker’s actions in computer intrusion cases. Memory parsing can be combined with file carving to recover digital photographs and video.
Reverse engineering is another important part of digital forensics because software and hardware developers generally do not provide the public with details of how their systems work. As a result, considerable effort is needed to backtrack through systems code and understand how data are stored. Today’s techniques to extract allocated files from disk images were largely developed through this method.
System analysis is the second leg of forensic research. It’s similar to reverse engineering, but the fundamental difference is that the information the analyst seeks may be unknown to the developers themselves. Although this idea may seem strange, remember that computers are complicated systems: Just as programmers frequently put bugs in their code without realizing it, programs invariably have other behaviors that aren’t bugs but are equally unforeseen by the original creators. Many system users and quite a few developers assumed that it was not possible to restore deleted files until data recovery experts developed tools that proved otherwise.