Wednesday, February 22, 2017


Part 2: Combining Predictive Coding and Search Term Classification in 5 Easy Steps

By Mark G. Walker, VP Advisory Services and 
Robin Athlyn Thompson, VP Marketing | Business Development

This week, part 2 of our 5-part series on combining search term classification and predictive coding. In case you missed Part 1, you can find it here.

Step 2: Dump the Junk

ESI collections include acquisitions of ESI from laptops, 3rd party sites, file servers, wherever users keep potentially relevant ESI resides. In some cases, entire user hard drives are collected. In other cases, just user files are collected. Whatever the collection method, thousands, millions, even billions of files are collected. Experience teaches us that less than 1% of information collected will prove to be valuable to your case. There are an enormous number of collected files that are of no value. Here are three common objective filters that can be applied to eliminate known garbage before you do any downstream indexing, analysis or classification. This is not intended to be an exhaustive list.
  1. De-NIST – NSIT is an acronym for National Institute of Standards and Technology. The National Software Reference Library (National Institutes of Standards and Technology, n.d.) is a sub-project of NIST which collects a master list of known computer applications to help maintain the known list of application and system files. To De-NIST means you use these resources to eliminate what are known application or system files that have no value in most cases.
  2. File Type Filter - Eliminate known file types outside of NIST. In most cases, an inclusive file filter ingests into processing only specific file types of interest. Audio, video, image and other specific file types may be set aside, or not used at all. These file types are very heavy, driving up cost, contain little or no text content and are difficult to analyze, often requiring a different process and workflow. Create a special process for audio/video files that may be relevant. Your eDiscovery budget will thank you.  
  3. Date Range Filter – We would urge caution when applying a date filter BEFORE processing. Processing is the act of extracting metadata and content. This process also expands container files such as email archive PSTs and ZIP files. If you apply a date filter before processing, and container files are being processed, you are virtually guaranteed to miss files of interest. By way of example, if you create a PST archive of my email today, it will contain months and even years of email, yet the date of the PST will be today’s date. If your date range filter does not include today’s date, that PST will be eliminated from processing consideration, even though email within the date range are inside the email archive.
Next week: Part 3 "Process the Good Stuff"

Wednesday, February 15, 2017


Combining Predictive Coding and Search Term
Classification in 5 Easy Steps

By Mark G. Walker, VP Advisory Services and
Robin Thompson, VP Marketing and Business Development
iControl ESI

This is the first step in a 5 step series...Stay tuned for Step 2 next week.

Introduction

So many of our colleagues in this industry have spent decades persuading lawyers and their staffs to adopt technology and use powerful search and conceptual classification to efficiently and effectively manage eDiscovery projects.  Just last week, an attorney referred to predictive coding as a “new thing,” saying “So, Walker, you’ve been hammering home that we should do a better job with search terms, now comes this new thing “predictive coding”– what is this again and why do you recommend we change gears?”  That became the writing prompt to distill our advice on this subject to these 5 simple steps to make your team a pro. 
Those of us who support leveraging advanced analytics technology, such as predictive coding, are not suggesting a shift of the gears.  Well, at least not some of us.  Technologists (primarily) have been trying to move lawyers away from the use of key terms as “objective” filters for some time now.  A decade or so ago, conceptual search emerged in legal software.  In 2017, the most common exchanges of filters in agreement among the parties are date range, file type and search terms.  These search terms are tangible and objective things that the parties can exchange, and will behave essentially the same irrespective of what platform is being used by the parties to perform search.  For example, a search for the term “diamond” should yield the same number of documents hit whether your platform uses DT Search, Apache Solr, or some other search engine.  For better or worse, we’ve made lawyers like search terms.  Convincing attorneys to like search terms has taken us decades, so let’s not waste the effort!  The use of advanced machine learning technology, reporting, and – yes - math, will make those objective terms better.  These steps are not intended to be a comprehensive list of every step and task that should be performed on cases involving Electronically Stored Information (ESI), as there are literally entire books devoted to this topic. Rather, this is a short list of those mandatory tasks that should be performed on virtually any case of any size.

Step 1: Identify the Witnesses, Preserve and Collect the ESI

The process begins with legal hold to make certain that relevant documents/ESI doesn’t disappear.  Legal holds can be issued across an organization, or to only those that are anticipated to have relevant information and facts.  There are web-based legal hold solutions that will not only automate the process of creating legal hold, but will also help monitor compliance and profile what information the witness(es) might have that is relevant BEFORE you must gather their files. 
The exercise of “profiling the data” of a custodian is a great way to determine what a specific witness has on their computer, inside email stores, file shares or where ever relevant ESI may exist.   Data profiling applications work by reading metadata to determine file type, size, counts and so on.  This information can be very handy in determining the cost of collecting, processing, and reviewing, and helps in forecasting a budget, defending against a potential motion to compel, or seeking protection against overly broad requests -  as if that never happens!  Step 1 should include the following, at a minimum:

1.  Identify witnesses that may have relevant information;
2.  Issue a Legal Hold across the organization or specific to witnesses that are known to have relevant information.  Many legal hold applications also provide the ability to create, send and track customizable fact based questionnaires;


3.  Monitor Compliance with the issued legal hold.  We highly recommend using an automated notification and monitoring;
4.  Profile the Data of witnesses who may have relevant ESI.  Profiling the data will determine what ESI witnesses have, or have access to and will help with precise cost estimates. 



5.  Preserve and/or Collect ESI.  If an organization has "preserve in place” capability, preserve ESI for those witnesses that are expected to have relevant ESI.  Preserve in place is the ability to prevent the deletion of email for specific custodians, for example.  Some companies use “journaling” as a way to preserve in place.  If the ability to preserve in place isn’t available, collect the ESI as quickly as possible.  The longer you wait, the more likely it is that relevant information may disappear, increasing the risk of spoliation.  The delete key is not your friend!  Preserve and collect broad -- collecting ESI is the least expensive part of the process.


Next week we will cover Step 2: Dump the Junk. Your comments and opinions are welcome.