Friday, June 22, 2012

TAR - Is Your Seed Sound?

Those that are using Technology Assisted Review (“TAR”) already know that the technology is sound.  As The Digital Advantage (TDA) has written before, it is not the technology we should question necessarily.  Rather, success relies upon how the technology is used and what one does with the results.  Most applications deploying TAR use sophisticated “find similar” algorithms to compare content patterns of one group of documents to another.   Using sampling techniques, small groups of documents are reviewed by subject matter experts and then the results are compared to the larger corpus by the technology.  The corpus is defined by the investigator, whoever that might be.  The technology then ranks each document in the corpus against the small expert reviewed sample.  Some have referred to this as a Google ranking of sorts. This small sample is generally referred to as a “Seed”.  There may be many seeds used in large projects.  Seeding, by the way, is a technique that has been used for just about as long as we have had search and review technology.  What’s different today?  The technology has gotten much better both in terms of sophistication as well as our ability to define and implement workflows.
So, the seed is critical right?  Bad seed, bad result.  And when we say the quality of the seed is important, we are not just talking about the quality of the seed review by the expert.  The make-up of the seed can make or break the result in terms of quality and/or time.  TAR technology and sampling methods are based upon random sampling a percentage of the target population of documents.  Traditional sampling methods assume a high relevance content within the overall population being investigated.  The lower the relevant content, the larger your sample size should be, technically.  For example, some sampling calculations assume that 50% of the sampled population is relevant.  A relevancy rate that high in our world is seldom the case.  Relevancy rates below 1% are common and rarely over 10%.  So, your random sample is a bit like a shot gun blast in the dark. 
Does that mean that we should have less confidence in TAR and sampling mythologies? No, it most certainly does not in our view.  Rather, doesn’t it make sense to create a better seed and increase the accuracy?  By the way, even a bad seed is superior to the common practice of throwing terms at the wall and seeing what sticks, but we digress.
As TDA has opined many times, search terms are the only truly transportable objective content filters that the parties can agree upon.  Using seeds constructed and reviewed from validated search terms increases dramatically the impact of the seeds and the success of any TAR workflow.  Far fewer documents will be reviewed.  Do you have sound auditing methodologies?  Are you just throwing technology at the problem, or are you using tried and true workflows? 

Monday, June 4, 2012

Deleted Email. Are you looking in the right place?

Those of us who live in the forensic world know that when you delete an email, it is not really deleted. Not right away. Most, however, don’t know where to look and may be looking in the wrong places.

Let’s start with where one won’t find deleted email. Individual deleted emails usually don’t reside the computer’s hard drive as individual emails. Not as single files. So, if you are conducting a forensic recovery of a hard drive and are looking for individual emails, you might be looking in the wrong place. The operating system keeps track of where files are placed on the hard drive. In the case of email, the hard drive MAY contain a single file containing many emails. Where to look is largely going to depend upon what email system is being used and how that system is configured. For the purposes of this discussion, we will assume the email platform is Microsoft Outlook. In this case, we will use the example of a laptop computer that is configured to manage email within an OST file. An Outlook Storage Table (.ost) file is an offline folder file utilized by Microsoft Outlook. Offline folders make it possible for the user to work offline and then to synchronize changes with the Exchange server the next time they connect.

Another type of file in Outlook is the PST file, which is simply stored on the client or a server other than the Exchange server. A Personal Storage Table, better known as the “PST” (.pst), is an archive file that Microsoft Outlook users use to manage email messages, calendar items and other things normally managed by Outlook. A PST file is usually stored on a user’s hard drive, or on a network share, as opposed to archived email managed by Microsoft Exchange. A PST email archive is usually created by individual users to store email outside of the corporate email environment and circumvent server storage quotas. What is the difference between an OST and PST file? An OST file starts as a mirror image of a user’s folders on the Exchange Server, and works in conjunction with the Exchange Server during synchronization. A set of PST files, on the other hand, is simply a storage location on the hard disk or a server other than the Exchange Server. When a user works offline, that user can add, delete, and modify the contents of an OST file exactly as that user would with a folder on a server. For example, a user can change and move items in the offline Inbox, send messages that are placed in the offline Outbox, and read offline public folders. In the meantime, information on the server is still processing. The user continues to receive new messages in the mailbox while other users can add, delete, and change items in public folders. However, the user is not aware of these changes on the server until a connection to the network is reestablished. It should be noted that while both OST and PST files are commonly referred to as "container" files, they are actually tables containing database entries (email text and metadata) and objects (attachments).

When a user is managing email using an OST or PST and deletes an email, the pointer to that email (and any attachments) are first moved to the “deleted items” folder within the tables. The pointer and the email will remain in that folder until the user “dumps” the deleted file folder. This act simply removes the pointer to that email from the “deleted items” folder within that PST or OST file. The pointer to the table entry containing the information relating to the email is removed and not the information itself (the text, metadata and associated objects). The email and the attachment (objects) will remain within the “slack space” of that OST or PST file until the user “compacts” that file, which will physically remove the information for those records within the tables. Most users don’t know how to perform a compact, so it is likely that any and all deleted email will remain in that file until the entire container file is deleted. There is no individual email file managed outside that OST or PST, so an attempt to recover from the hard drive of the device you are examining will not recover individual emails directly. Rather, what one would recover is, for example, an entire OST or PST file that a user has deleted. It is important to look for and recover entire OST and PST files from the hard drive, by the way, so don’t skip that step. If you do, you will be missing potentially important information.

Recovering email from the slack space of an OST or a PST recovers those orphans and is a step that is routinely missed by most in this business. Most tools and processes don’t look in the right place. Are you missing important evidence by not looking in the right place?