Monday, April 20, 2015

Do Lawyers Not Get Technology, or Do Technologist Not Get Lawyers?

The short Answer:  A little of both.  Here's why the answer is complex...or is it?

The Great Transparency Debate

No question that lawyers need to keep up with technological advances that are rapidly changing the practice of law.  Comment on ABA Rule 1.1: Competence reads in part:
"To maintain the requisite knowledge and skill, a lawyer should keep abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology, engage in continuing study and education and comply with all continuing legal education requirements to which the lawyer is subject. [emphasis added]"
It is true that some litigators don't understand technology at the level of the technologist. It seems like just yesterday that we had those CRT monitors on our desks and paper documents were everywhere.  But, should we expect litigators to gain a deep understanding of technology? Have the technologist in the process of peddling their goods over complicated things by trying to make their special sauce too special?  
What litigators do understand is this - the vast majority of all ESI preserved, collected, filtered, reviewed and produced will never see the light of day outside some hosted environment.  Most ESI will never even receive a bates number, much less get used at a deposition, in a court filing or in court during trial.  In fact, in most cases, less than 1% has any value at all.  
“I saw one analysis that concluded that .0074% of the documents produced actually made their way onto the trial exhibit list-less than one document in ten thousand. And for all the thousands of appeals I've evaluated, email appears more rarely as relevant evidence.” DCG Sys., Inc. v. Checkpoint Techs, LLC, 2011 WL 5244356 at *1 (N.D. Cal. Nov. 2, 2011) (quoting Chief Judge Rader) 
So, lawyers know that they are looking for but a handful of documents in a vast sea of mostly useless information. Throw in a bunch of technologist, many of which have never darkened the doors of a courtroom in a professional capacity.  Add a desire for complete transparency and "court approval" by the technologist, and some lawyers, and you have a great deal of confusion.  Much of the concern about transparency is misplaced.  
John Tredennick, of Catalyst Repository Systems, just this weekend published an article Using CAL to Solve the Transparency Problem in TARthat correctly concludes (among other things) that it's not the technology that needs to be defended.  Although, John limited his conclusions to Continuous Active Learning (CAL) (a/k/a TAR 2.0) applications.  Lawyers and technologist are focused on the wrong part of the TAR story.  ESI Advantage here and in the past continues to submit that whether your using TAR 1.0 or 2.0, the technology should not be put on trial, so to speak.  Technology is simply a tool.



The Solution

ESI Advantage presented the solution to the misconception that court approval is needed for litigators to use technology back in February 2012 - "Technology Assisted Review ("TAR") - We have been expecting you!  What follows is the solution proposed then and still valid today.
_____________________________________________
Some continue to think that courts are going to somehow sign off on “Predictive Coding”. 
Given the significant benefits that technology-assisted review can bring to e-discovery from the dual perspectives of quality and cost, expert commentators have asked a key question: Why isn't everyone using it? Of the 11 e-discovery vendors surveyed by Kershaw & Howie:
“[t]he most mentioned reason cited by respondents was uncertainty or fear about whether judges will accept predictive coding.” Kershaw & Howie,Crash or Soar (supra).
For the love of Pete, please stop thinking of this technology as something that the courts are going to “sign-off” on!! They will not.  Judge Peck said just that in the aforementioned Forbes article.   
What Technology-Assisted Electronic Discovery Teaches Us About The Role Of Humans In Technology.
“Until there is a judicial opinion approving (or even critiquing) the use of predictive coding, counsel will just have to rely on this article as a sign of judicial approval. In my opinion, computer-assisted coding should be used in those cases where it will help “secure the just, speedy, and inexpensive” determination of cases in our e-discovery world.”
The courts have for years encouraged the parties to cooperate and come to agreement on the filtering methods.  This attitude of expected cooperation on the discovery process is nothing new.  It is not something that the technologist dreamed up.  When you read the rules of any state or federal court, you will find agreement and cooperation mentioned frequently.  While most “predictive coding” applications are built upon the same basic statistical algorithms and linguistic pattern detection, they don’t all always achieve the same result.  This is because that how you conduct the analysis – the workflow - is critical to the end result.  TAR is not the silver bullet alone.  Lawyers SHOULD NOT have to defend the technology – how the algorithms work, or what the technology does or does not do. Instead, we should focus on the workflow and the end result.  The technology being utilized should be transparent.  The workflow and end result should be based upon what a human decides is relevant.  For us at [ESI Advantage], there are really only two types of filters that need be explained [when transparency is desired].


Objective Filters:  Objective filters are essentially filters that can be applied without any “subjective” reasoning.  File type and date range are two common examples (not an exhaustive list) of filters applied in most cases. These filters should not be considered work product. Objective filters are often disclosed and agreed upon between the parties. 
 Subjective Filters:  Subjective judgment is applied to document decisions in most cases.  When a lawyer reviews a document, the “mental impressions” that lawyer makes is generally considered work product.  In the old days, we did not disclose how many “boxes” we reviewed to arrive at a production that may only be a few documents, or a few boxes out of hundreds.  We did not disclose what file cabinets were opened, or who was interviewed.  Then as is the case today, that production would occur in one of two forms (generally speaking).  Documents are either produced in their “usual course of business”, or “by request.”  In the “old days” we avoided producing “by request” as if it were the plague.
Search terms today are often used, [and will be continued to be used for some time to come...] A term is indeed an objective filter, but you may arrive at the use of a term using subjective factors.  You may even use “predictive” technology that, in its simplest form, looks for common words, concepts and patterns.  Either a document contains that word (or some variation) or it does not.  How you arrived at the term, however, may well be subjective and protected work product (some lawyer will make that call).  After all, attorneys are officers of the court and when responses are provided, the attorney for each party is representing that a diligent search has been conducted and that there has been compliance with the requests, or the appropriate objections have been lodged.

Conclusion

Nothing has changed since that February 2012 post, except that we would have expected lawyers to have adopted TAR methods in greater numbers than we've seen over the past several years.  In our humble opinion, the pure and simple reason is that technologist and litigators are not always on the same page about what is important.   As outlined above, we need to be focused less upon how the sausage is made (the math behind TAR) and more on how the sausage tastes. The parties should agree upon filters that they can exchange such as validated search terms, date and file type filters.  Whether your using a TAR 1.0, 2.0 or the yet to come 3.0 method is of little consequence to the courts.  The courts simply want everyone to get along.  

Tuesday, April 14, 2015

Why the Cloud is Cool!

Introduction

According to Wikipedia, "Cloud computing is a computing term or metaphor that evolved in the late 2000s, based on utility and consumption of computer resources. Cloud computing involves deploying groups of remote servers and software networks that allow different kinds of data sources be uploaded for real time processing to generate computing results without the need to store processed data on the cloud. Clouds can be classified as public, private or hybrid.[1][2]"

Many use Cloud computing every day without even realizing it. Examples include web-hosted email (such as Gmail) and social media sites (such as LinkedIn and Facebook). For Cloud-based email, the email application and the actual messages are hosted on servers sometimes across the country from the user. Users log-in via a web browser or PDA app, and email is displayed on the device but the user's reply, send or forward commands are executed on a server, or several servers across town, or across the country.  

In my world - the legal field - lawyers are literally being overrun with data - Electronically Stored Information (ESI). Legal software has historically followed the traditional software model:  lawyers purchase a software license and install on individual computers or on servers in the law firm's data center.  Hardware and software following the traditional model can be expensive.

More recently, a new software model has emerged: Software as a Service (or "SaaS"). SaaS is different from traditional software. Software is no longer installed on individual computers or the firm's server. Software is accessed via a web browser over the Internet. Data is transferred encrypted and stored securely in the service provider's data center. Updates are rolled out by the service provider or software company continually. SaaS is usually sold on a subscription basis, usually for a flat monthly or annual fee.

Here are just a few reasons why the legal community should give SaaS and Cloud Computing a serious look.

Spend Less and Do More

With no servers to maintain, or storage to add, hardware costs are reduced.  Software costs are also reduced because with most ESI software cost models, you only pay for what you use.  While it is critically important for law firms and corporate legal departments to have some level of support staff, staffing needs are obviously less because there are fewer things to manage from an IT perspective.  Your technical staff spends more time with the higher level functions of helping you learn and use technology and less time managing hardware, software, security and infrastructure.

You Get the Good Stuff Sooner

Because the software provider only needs to update it's own environment, and not thousands of installs across the country, you receive the latest and greatest updates as soon as the provider makes them available.  No more waiting on the firm's IT department to test and then execute a roll-out plan. By the way, roll-out plans are critical, so this should not be a knock on IT departments.  Lord knows this old war horse has been through countless upgrades and roll-outs and knows the importance.  Something always goes wrong - 100% of the time.

One Neck to Choke

The last thing you need is to choke your own neck. Since it is the SaaS provider conducting roll-outs on their own servers and not yours, it is up to the SaaS provider to make sure that the roll-out goes smoothly. When it doesn't, you have one neck to choke.  Most software contracts have guaranteed up time provisions.

Your Compatible

Traditional legal software is often limited by hardware compatibility, operating system, and licensing constraints. For example, traditional ESI software may only be compatible with a PC running Windows XP or later.  There are usually also certain hardware requirements that must be met, and then updated when the provider issues new software.  It is not uncommon for users to have to upgrade, as an example, RAM to take advantage of new features.  Because SaaS solutions are web-based, they are usually compatible across platforms and web browsers.  If the hardware requirements change with a new release, it's the SaaS provider's responsibility to make sure your compatible and not your IT department. 

Your Data is Safe - Guaranteed

Lawyers are understandably concerned about security.  However, a legitimate argument can and has been made, that files stored on a provider's servers are more secure than those located on a typical attorney's PC or in a law firm's data center.  SaaS service providers often employ elaborate security measures and multiple redundant backups in their data centers.  Obviously ask a lot of questions about a service provider's security and make sure you get assurances in writing.  If there is a breach, and you've done your due diligence, you've got a neck to choke and it is the provider that is responsible for the cure, whatever that might be.

Conclusion

There are a great number of additional considerations when deciding whether a SaaS based model is right for you and these are but a few.  There are many considerations not addressed here.  Get help from an expert.  Ask questions, get comfortable with your choice, but continuing to ignore cloud based solutions is without question a mistake.  Don't get left behind.  


Monday, April 6, 2015

Are your samples random, or are you just getting random results? (Revisited)



This is the second in a series of posts where we will revisit ESI Advantage articles and posts from the past. We will update and critique in places. As always, your input and comments are welcome. This post is from May 13, 2012.



Introduction


There has been much discussion of late about a variety of "emerging" technologies. Predictive coding, err um, Technology Assisted Review (TAR), statistical random sampling (SRS), what some algorithm does or does not do, complete with panels of experts to explain this that or the other. Entire days of testimony are being devoted to peeking behind the curtain of “predictive coding” algorithms. The Digital Advantage continues to ask, why all the fuss? Shouldn't we be focused on the merits some say? Why all this math? After all, most lawyers will tell you they are lawyers today because of the supposed lack of math, but we digress.
Comment: Since this article, Continuous Active Learning (CAL), or TAR 2.0, has emerged as sound option over previous methodologies, but only in some cases. Prior to the emergence of CAL, Simple Passive Learning (SPL) and Simple Active Learning (SAL), are typically associated with early approaches to predictive coding, which are often now referred to TAR 1.0.[i]

It’s Just Math


Ralph Losey in his most recent treatise on random sampling is quite the read.   Ralph predicts (trumpets sound)…

“….in the year 2022 a random sample polling of American lawyers will show that 20% of the lawyers in fact use random sampling in their legal practice. I make this prediction with an 95% confidence interval and an error rate of only 2%. I even predict how the growth will develop in a year by year basis, although my confidence in this detail is lower.” (Random Sample Calculations And My Prediction That 300,000 Lawyers Will Be Using Random Sampling By 2022)
Ralph’s prediction, of course, is tongue-in-cheek. Well, sort of. The math behind sampling is serious. When you consider that unlike the broader world where statistical models are intended in studies or polls, in the document world the all important base line changes from one matter to the next. Put another way, statistical modeling is only as good as the information you feed into it and any assumptions that you apply. By way of example, Ralph’s conclusion that 300,000 lawyers will utilize random sampling by 2022 may be flawed.

Assuming that by the year 2022 there are 1.5 Million lawyers (the ABA estimated there were 1,128,729 resident, active lawyers in 2006), I predict that 300,000 lawyers in the U.S. will be using random sampling by 2022. The confidence interval of 2% by which I qualified my prediction means that the range will be between 18% and 22%, which means between 270,000 lawyers and 330,000 lawyers. I have a 95% level of confidence in my prediction, which means there is a 5% chance I could be way wrong, that there could be fewer than 270,000 using random sampling, or more than 330,000. (Id.)
  
Flawed you say? The equation for determining sample size based upon known and desired factors such as population size; tolerable margin of error and other factors are well settled. No, the math is not questionable. It works. We have seen it over and over.  Trust those math folks.  Be careful with your inputs.


In terms of the numbers you selected above, the sample size n and margin of error E are given by
x = Z(c/100)2r(100-r)
n = N x/((N-1)E2 + x)
E = Sqrt[(N - n)x/n(N-1)]
where N is the population size, r is the fraction of responses that you are interested in, and Z(c/100) is the critical value for the confidence level c.
Rather, the math only works if your inputs and assumptions are sound. Here, not all 1.5 million lawyers Ralph assumes will exist in 2022 are litigators. We don’t expect that real-estate and tax lawyers will be utilizing random sampling related to document review. Those contract lawyers are unlikely to be interested in sampling as well. So, the population Ralph starts with may be far less than 1.5 million. Although, we have not audited Ralph’s results and that is not really the point. Here, the result would be sampling more than you need, which is not a bad thing necessarily. The result would be better. But, that simply re-enforces Ralph’s over arching point and ours here – it is not precise statistics that are important. Statistical sampling is a tool among many other reinforcing tools. You don’t have to be a Ralph Losey type lawyer and gain an understanding of statistical sampling (the underlying math), or hire an expert to explain it to a judge or jury. Sample size is important so that you are gathering enough inputs and that those inputs carry the least amount of risk you are going to miss important information. The process used should measure objective information. Results should be validated and audited, so getting a precise sample size is not as important as using some rule of thumb that is repeatable. Statistical sampling is simply a method by with you are organizing documents upon which to gather what to tell the machine. When you consider that less than 1% of all documents that have any value at trial, reviewing everything simply is not possible, nor necessary in virtually all cases.

“I saw one analysis that concluded that .0074% of the documents produced actually made their way onto the trial exhibit list-less than one document in ten thousand. And for all the thousands of appeals I’ve evaluated, email appears more rarely as relevant evidence.” DCG Sys., Inc. v. Checkpoint Techs, LLC, 2011 WL 5244356 at *1 (N.D. Cal. Nov. 2, 2011) (quoting Chief Judge Rader)


Follow a Simple Process

Unlike the use cases for which random sampling models were built, in the document review and production world, we are not shooting in the dark. While it is true that in almost all cases the relevant material is very small in proportion to the amount of material available for analysis, we have a pretty good idea what words and phrases appear in relevant documents, providing at least a start. The subject matter is known. Filter parameters can be narrowed by date, authors and recipients and any number of other known factors. In the old days – those days before technology – we just knew where to look. Today is no different except that we now have technology to help us. Technology helps us increase the odds in our favor. Audits will identify new terms, phrases and concepts for the technology to use to find new sources. Sampling is not so random.

It is becoming common place to agree upon and use search terms, often without any testing or validation of those terms what-so-ever. Wouldn’t it be important to know for certain, say with a 2% chance of error, that term you chose would return relevant documents? Don’t you want to know what percentage of your effort will be wasted if you review all documents hitting a specific term? Why not take a “statistical” sample of all documents hitting that term and measure the relevancy rate? You don’t need to prove what’s statistically appropriate, there are ample calculators that will “do the math” for you. The math has been proven. See sample size calculator by RAOsoft.com and Ralph Losey’s “worst case scenario”  sample predictor. Using statistical sampling calculators inside a well reasoned process to, as an example, test the validity and recall rates of terms that are being contemplated is not something that should have to be defended. You are simply using a calculator to help you find the best possible samples upon which to measure a term(s) effectiveness. Ultimately, it is the term (along with other objective filter parameters) that are agreed upon, not what constitutes a sound statistical sample. In other words, the result and the confidence in that result, not necessarily how that sausage was made. Humans, not machines, are deciding what terms, topics and concepts are relevant. The technology simply finds documents with content similar to that which a human has decided as relevant. That’s why some call this emerging technology “machine learning”.

Today, agreeing upon a set of term or phrases remains the only reliable objective filter that can be agreed upon and easily transferred from one technology to the next. Terms that are validated utilizing a repeatable and quantifiable methodology is going to make it much easier to defend choice of terms. And oh by the way, these are not things about which we are guessing. Don’t guess, get help.

Comment: Since this article appeared in ESI Advantage, two significant studies have been the topic of many commentators. These studies codify our conclusions in two ways. First, the vast majority of discovery costs are document review (Rand Report). Among the many conclusions in both the Rand report and The Crossman/Carmack study, is that a hybrid approach to document review is the favored approach – Judgmental sampling using validated search terms coupled with Technology Assisted Review (TAR).[ii]
____________________________________________________________
[ii] “Where the Money Goes – Understanding Litigant Expenditures for Producing Electronic Discovery” (Rand Institute For Civil Justice; by Nicholas M. Pace and Laura Zakaras; 2012)

“Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery (Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, XVII RICH. J.L. & TECH. 11 (2011))