Tuesday, July 12, 2016

TAR - Not Just For Big Data Volume Cases

The events of the last couple of weeks have given me a great real-life example to share with you regarding Technology Assisted Review (TAR).  These use-case anecdotes are right in line with our educational program this month providing education with TAR.  It’s our duty to continue to educate ourselves on the technology available, and the risks and benefits of its use, and below are two great examples of instances demonstrating that TAR is not only valuable, delivering ROI, in big data volume cases, but in small ones as well. 

The use of TAR and its work flows is nearly a common practice (and in fact almost mandatory in BIG data volume cases).  Indeed, in our shop, we just completed a large 8.5-million record case where the lawyers reviewed only 6,000 (less than 1%) documents to achieve technology training stabilization.   What is stabilization?  Stabilization is the point where stability scores tell us that the technology has learned all it is likely going to learn from a sample review.  Because of how well TAR worked in that case, we measured over $1.4 M in ACTUAL review cost savings just based upon what TAR indicated would not be relevant documents.  The vast majority of what was identified as relevant by this process was produced without review – over 350,000 (a claw back agreement was used to protect any privileged documents produced).  There were about 30,000 documents for priority custodians that had to be reviewed before production.  The legal team chose to review only what TAR determined as relevant.  Precision was measured at 77%.  What does that mean?  77% of what the TAR process deemed relevant was in fact relevant, confirmed by human review.  This precision rate is very good, and the savings remarkable, right?

Well, that wasn’t the only remarkable thing we learned about TAR this week.  I ran into a lawyer at an event a few weeks back and we exchanged greetings.  I gave him my business card and told him “call me if you ever need help with eDiscovery.”  A week later, my phone rings and the conversation begins “I have your card here, and remember that you said call me if I need help with this “eDiscovery stuff.”  He needed help indeed, and fast.  He represented a client who has been sued over a trademark issue.  They were sitting on the other side of a motion to compel ruling that required them to collect, filter, review and produce in less than two weeks.  The attorney had a 3-person staff to get the work done and knew that the normal approach would not meet the deadline and an extension was not available.  He asked if I had any idea what he should do.  We were looking at a situation most shops would consider a small case with one custodian which traditionally is not a great number of documents.  The attorney was from a small firm, with limited resources and budget, and limited time.  I decided to advise that we treat this matter as if it were the 8.5-million record case I talked about above, and use TAR and its work flows.   I am sharing with you below the steps we took.  Again, this feeds directly back to my opening paragraph: Some lawyers today are not familiar with technology, which is one of the primary drivers behind the amendment to the ABA Model Rules of Conduct.  In those cases, we use a defined step-by-step process to educate and inform how the process works.

The upshot in this “small” case is that the deadline was met.  In fact, we were a day early.  Documents reviewed – 650.  Documents produced 12,211.

Step 1:  Collect Data.  Ooooops – we discovered the custodian in this small case had much more data than expected -- more than 300 GB!  Finding more data was not conducive to meeting the tight deadline in a standard approach!
Step 2:  Filter out all the file types we do not want or need – the lawyer decided to focus on a few very specific file types.  Process and deduplicate.  Weed out whatever we can by other judgmental means.  The result – 210,000 documents remain.  OK, that is better than the original collection, but way too much to review!

Step 3:  The lawyer indicated he wanted to try using search terms. Result?  28,000 documents came back as hitting the terms, surprising the lawyer.  What surprised us even more was that it would take north of 250 hours to review those documents.  There was neither the time nor the money to follow that process.  What now?  Step 4!
Step 4:  Enter TAR and EnvizeTM, our machine-learning tool with Active Learning.  We will use the initial (completely untested) terms and run analytics just on the 28,000 documents hitting those terms. We create a few “Judgmental” random samples and launch into review/training.  No control batch because EnvizeTM doesn’t need one, at least not at this stage.



Step 5:  Terms are not bad – about 30% of the training documents reviewed in the first judgmental random sample were actually responsive.  That is about what we expect with untested terms and exactly what we hope for to train the technology – a good mix of relevant and not relevant docs.  We next created judgmental random samples to start and then use Active Learning to feed the reviewers what EnvizeTM said it needed to learn – that is the beauty of active learning.
Step 6:  Stabilization occurred very quickly. Figure 1 above shows the result after 815 documents.  At this point, we switch to Continuous Active Learning (CAL) to feed the reviewers highly relevant content – documents that have the highest relevance scores.
Step 7:  After just a few hundred CAL docs reviewed, lawyers report that they have become confident that the technology has done its job and ask that we run priv screen and produce.  We suggest QC and audits.  Lawyer says – not looking for precision, just looking to make sure we are not missing anything and don’t care if we are a bit over inclusive.  We ultimately review a random sample of the “left behind”, just to make sure we were not missing anything.  We had not.

Step 8: DONE – everybody is happy.

Conclusion?  TAR has utility beyond big-data volume cases.  Almost any case of any size that has ESI can benefit from using machine learning technology and a sound TAR work flow. 

Want to learn more?  See the July Webinar replay here.:
TAR:  A Peek Inside the Black Box.