Wednesday, August 30, 2017

Do You Agree to Untested Search Terms?

Introduction

Unless you are using a proven key term validation methodology that utilizes sampling, you are almost certain to agree to terms that are likely overly broad. Obviously, agreeing to terms that are overly broad is costly, but common.  The greater risk - what are you missing?  

Too many declare that search terms "look good" without the appropriate testing.


Victor Stanley v. Creative Pipe, Inc. (2008)

I recently ran across a Latham & Watkins “Client Alert” from July 2008 - The Importance of Victor Stanley v. Creative Pipe, Inc. for Electronic Discovery.  The Victor Stanley opinion was the first prominent opinion involving search terms and ESI.  In that case, the context was the use of search terms to screen documents for privilege before production.  The terms the defendant used were, well, untested, poorly crafted and as a result missed privilege documents that were ultimately produced.  Bad things ensued.  The Victor Stanley opinion is now almost a decade old, yet many still agree to terms (or use them unilaterally) without a defensible methodology.  Judge Grimm gave some sound advice back then that still applies today.  The Latham white paper summarizes Judge Grimm’s insight thus:

 “Search Methodology In evaluating the reasonableness of the defendants’ methodology, the court was critical of the fact that the keyword search terms were selected by the party and its attorneys and were not disclosed to the court, noting that, “while it is universally acknowledged that keyword searches are useful tools for search and retrieval of ESI, all keyword searches are not created equal.”23 The court suggested that proper selection of keywords involves “technical, if not scientific knowledge.24 The court further cautioned that the choice of terms is important because “simple keyword searches end up being both over- and under-inclusive in light of the inherent malleability and ambiguity of spoken and written English (as well as all other languages).”25 The court also noted that the defendants did not assert that any sampling was done of the text-searchable ESI files that were determined not to contain privileged information on the basis of the keyword search to see if the search results were reliable.26 The court’s order suggests that sampling is a prudent way to test the reliability of keyword searches and verify that the searches are neither over-inclusive nor under-inclusive.” [emphasis added]
 The Latham "alert" also provided some sound “practice pointers”
  1. Meet and Confer with the Opposing Parting Regarding Search Terms
  2. Determine Whether to Ask the Court for a Discovery Order Compelling Production
  3. Be Prepared to Defend the Search Methodologies.

 Key Term Testing

Figure 1 Recenseo Search Term Manager

That 3rd Latham practice tip is where most struggle getting it right. Too many use the wrong tools, the wrong methodology and/or get poor technical and workflow advice.  Running a search and reviewing the result without testing isn’t the way to go.   To properly sample and test search terms, requires solid search term hit reporting.  Figure 1 above is reporting from analytics software Recenseo.  The Recenseo Search Term Manager interface allows the user to revise terms on the fly and immediately update the hit rates.  When used in conjunction with “Predictive Analytics”, one has powerful information about how well search terms are performing across the entire corpus, or limited to a specific scope.  For more on how to combine Search Term validation with Predictive Analytics, see “Combining Predictive Coding and Search Term Classification in 5 Easy Steps”

Automated sampling functionality within Recenseo makes it easy to pull random samples of key term hits, review them, and then analyze the “responsive rate” because reporting can be narrowed to a specific scope such as those documents tagged, or "predicted", as relevant.  This is a critical distinction from some applications on the market where key term hit reporting is either not present at all, or only available across an entire corpus.  There are also performance issues with some applications where getting the critical hit reporting requires, search construction expertise, long waits and substantial computing resources.  Searches are slow!  While the overall hit rate is important, It is also important to know the hit rate on tags, or maybe a specific custodian of interest, so you can properly evaluate the effectiveness of key terms before you agree to them.  See Figure 2 below.



Figure 1 - Recenseo Hit Report Export to Excel

Conclusion

Do not agree to terms until you know the relevancy rate for each term.  Do not agree to a term if it is hitting only 30% responsive, for example. Pick the threshold that makes you comfortable.  While it’s better to be overly broad than too narrow, being too overly broad costs time and money.   At a 30% responsive rate, 70% of your effort is wasted on documents of no value. Get help.  Settle on a provider, methodology and technology that provides you with the proven workflow, tools and proper advice.  

Be careful with those search terms!

No comments: