Thursday, March 24, 2011

Predictive Coding: Angel or Demon?

Everywhere you turn, it seems, someone is opining that predictive coding is either a blessing, or a potential risk. Like Early Case Assessment (ECA) the definition of predictive coding changes, depending upon the commentator or audience. As a result, like ECA, predictive coding is misunderstood and increasingly misused. Some think it pure technology. Others think of it as more of a complex workflow. Predictive coding really should be a combination of technology and workflow. So, what is predictive coding really? A recent Forbs Law & Technology post “EDiscovery and the Rise of Predictive Coding”, by Ben Kerschberg quotes a recent Law.com webinar:
"According to a highly informative webinar presented by Carpenter and Trenchard hosted by Law.com, predictive coding is defined by at least three defining traits. First, predictive coding leverages small samples to find other relevant documents. Second, it reduces the amount of non-relevant documents that attorneys must review and cull, leaving the reviewer to look at approximately five to 20 percent of any set of documents. And third, the results generated by predictive coding can be validated statistically.”
So, predictive coding is reviewing more substance, getting to the hot documents faster with less junk and not having to review everything? Wait, isn’t that early case assessment and data reduction that we have been doing for years you might ask? Yes, it most certainly is. The difference today being that software applications are beginning to build mathematical algorithms into their applications that serve to identify like content, streamline the organization of like documents and not just duplicate or near duplicate prior to review. These algorithms can now seek out and tag documents that are similar in content based on concepts as opposed to individual terms or strings of terms. In other words documents with similar content without being identical or nearly identical. Different software applications and service providers call this process by different names. A recent “eDiscovery Institute Survey on Predictive coding”, surveyed 11 companies that were a mix of software providers and eDiscovery services that offered predictive coding. On the topic of what to call this new animal, 8 of the 11 providers thought that their term describes the process better than predictive coding:

• Prognostic Data Profiling
• Predictive Ranking
• Relevance Assessment
• Suggestive Coding
• Predictive Categorization
• Automatic Categorization
• “Propagated Coding” or “Replicated Coding”
• Automated Document Categorization

Yikes! Just what the market needs, more market confusion. As you read this survey and each provider’s description of their version of “predictive coding” you soon learn that, like ECA, this term is being broadly applied across what are essentially data reduction techniques. You also begin to realize that “predictive coding”, like ECA, is really NOT new. Rather, it is another marketing buzz word being created to describe a new spin on an old process that is now (in some cases) being automated by technology. In fact, one service provider responded that it has been delivering predictive coding services since 2003 via use of the Attenex application (now owned by FTI). Wait, that means FCS (the company I work for) has been providing predictive coding serves since 2002 since we were the second Attenex partner! One software company, Recommind, would not describe the basis for their technology saying:
All software, processes and workflow are the proprietary intellectual property of Recommind and cannot, therefore, be disclosed.
And these guys wonder why lawyers are not falling alll over their software.  Trust me, they are not. 
Interestingly enough, everyone else described their process or technology and some in great detail. What none of them tell you is that those software applications use what are essentially mathematical algorithms that rank relevance based upon content that have been in use for decades in other industries for other purposes. When you conduct a Google search, for example, Google ranks content based upon relevance using it’s own propriety ranking algorithm. The secret sauce of these applications that use relevancy ranking, however, are usually based upon open source technology like Lucene. Yes, even Google started with some code written by someone else.  Programmers use open source code whenever possible, primarily to avoid having to pay for a license. Equivio explains their secret sauce thus: 
Equivio>Relevance enables organization of a document collection by relevance. Based on initial input from an attorney knowledgeable of the case, Equivio>Relevance uses statistical and self-learning techniques to calculate graduated relevance scores for each document in the data collection. As an expert-guided system, Equivio>Relevance works as follows: An expert reviews a sample of documents, ranking them as relevant or not. Based on the results, Equivio learns how to score documents for relevance. In an iterative, self-correcting process, Equivio feeds additional samples to the expert. These statistically generated samples allow Equivio>Relevance to progressively improve the accuracy of its relevance scoring. Once the sampling process has optimized, Equivio scores the entire collection, calculating a graduated relevance score for each document. The product includes a statistical model which monitors the software training process, ensuring validation and optimization of the sampling and training effort. 
 What Equivio has done, as have others, is to take a manual sampling process and automate the workflow. The Devil, however, is in the details. Technology like this in the wrong hands with the wrong workflow can be very dangerous. The proper human driven audits and documentation must be present; independent from what the software suggests is relevant. Relying upon a software driven audit trail simply is not enough. You must have a defensible and repeatable workflow that leverages sound technology. If you find yourself in a spot where you have to defend technology in court, you’re using the wrong workflow. As I have written here many times, technology is an organizational tool. How you organize your review is work product. As long as what you produce, or don’t produce, is based upon a process that ultimately arrives at decisions using “objective” criteria, like a transparent term, you should not have to disclose how you arrived at your production. We did not do it in those days before technology and should not place ourselves in the position of having to do so now.

There has been a lot of discussion about technology replacing human review. It has been suggested by some, including recently the New York Times, that one can even produce documents by reviewing samples and then producing like files (identified by Equivio for example) without the need for review! Interesting concept if you’re a software programmer, eDiscovery sales person or consultant that has never worked the business end of a lawsuit. However, those of us that have actual trial experience know this is a dispute waiting to happen, or worse, inadvertent production of privileged material. The number of instances of the production of privileged material is happening with greater frequency. You’ve seen the cases. It is no coincidence that the number of instances where privileged material has inadvertently been produced has increased since the use of technology has become widespread. You have a claw back agreement you say? Well, once that skunk is in the jury box, the stink is hard to remove. And what about knowing the facts of your case? You have a privilege screen using email addresses and search terms? So did many of those famous privilege waiver cases that lacked a defensible process that included actual review of what has been identified for production. Setting aside the great risk of inadvertent production of privilege documents, isn’t it important to review the documents you produce if for no other reason than to know the facts of your case? What a novel idea – learning your case by reviewing relevant documents! Remember that old cliché – a lawyer never asks a question to which he/she does not know the answer. Although, I’ve seen trial lawyers ask those questions before, making me squirm in my chair. How can any trial lawyer worth his/her salt produce documents that have never been reviewed having simply been tagged by a piece of software because they are similar to something that has been reviewed? The right approach, and fortunately the approach taken by most, is to push to review BEFORE production documents that, based on content, have the anatomy of documents that have already been reviewed and classified by human review. What is missing from many of these workflows, however, are audits. Sampling review rounds that validate no relevant document left behind. The goal is to increase the percentage of relevant documents being reviewed and reduce the number of irrelevant documents thereby reducing the cost of review. In a recent matter as an example, more than 10 million documents were available for review. Using a simple “predictive coding” workflow, only 20% of those had to be reviewed. Of those reviewed, almost 70% were relevant compared to the usual 10-15% at best when reviewing everything. All of this was accomplished without the use of software that automatically tags using a predictive coding algorithm. Rather, a workflow that utilized sampling, analytics, concepts, conversation threads and finally human driven review audits. The process is understood and driven by lawyers, not a programmer. I don’t want to send the wrong message, however. Predictive coding is not a demon.  This commentator is encouraged and excited with the advancements in technology. It is very beneficial to our ECA and data reduction process to have portions of the workflows automated. However, it is discouraging that some are applying this great technology in the wrong way and placing too much reliance upon technology that, frankly, few outside those few programmers that are using the same basic open source algorithms understand. If your using “predictive coding” technology and workflows, be certain you follow a tried and true process that does not place too much reliance on technology you will never understand. Be careful out there!

Thursday, March 10, 2011

Lawyers, Replaced by Cheaper Software?

You know something has hit the big time when covered by the New York Times.  On March 4, eDiscovery did just that as the topic of the New York Times article by journalist John Markoff -"Smarter than you think:  Armies of Expensive Lawyers, Replaced by Cheaper Software".  While a good thing that eDiscovery is finally getting international attention, it is a bad thing that the Times suggests that software, particularly “cheap” software, is replacing lawyers.  Take it from someone on the front lines - nothing can be further from the truth.  Ralph Losey, a self proclaimed eDisocovery lawyer and the face behind the eDiscovery blog - eDiscovery Team - says it best in his article "NY Times Discovers e-Discovery, But Gets the Jobs Report Wrong".
"Apparently the NY Times has fallen for what economists call the Luddite fallacy, the erroneous assumption that new technologies increasing productivity necessarily reduce employment. As economist Alex Tabarrok observed: “If the Luddite fallacy were true, we would all be out of work because productivity has been increasing for two centuries.”
John Markoff is a well respected technology journalist, but he clearly knows nothing of lawyers or eDiscovery. Unfortunately, the software companies he chose to quote are those that are currently miss reading the current and future need. Anyone who thinks that technology will ever replace lawyers simply has not spent time in the trenches. Mr. Markoff, you have been lead astray by marketing I fear. Technology by itself is not a silver bullet. Any credible software company will tell you just that. As Ralph Losey so eloquently observes:
"The [NY Times] article seems to overlook the fact that the advanced e-discovery search and review technologies all still require lawyers to operate. They still require skilled attorneys to fit the technologies into a larger legal methodology. They still require the ESI to be understood. The software programs do not run themselves. They are only a tool. They are just a hammer, and without a carpenter, they will not build a case on their own."
The biggest misconception today is the notion that lawyers must understand and utilize advanced technology.  This, I submit, does nothing but distract lawyers from what they have been hired to do - deliver legal advice.  Lawyers should focus on discovering and presenting the facts.  That job is hard enough without also keeping up with the latest trend in “predictive coding” algorithms, as example.   Don't they have enough to keep up with?  That does not mean that lawyers should not be familiar with and utilize technology.  They should.  It does not mean that advanced technology is not an incredibly useful tool.  It most certainly is.  Lawyers have historically relied upon non-lawyer professionals such as legal secretaries, paralegals, technologist and administrators to keep up with such things.  A great lawyer once said, and I am paraphrasing –
"I don't really know much about technology. I barely know how to use a computer. What I do know, however, is how to win a case. Today winning means using technology - the best that I can find. I don't really need to understand it or keep up with technology because I have surrounded myself with smart folks that do that for me. My focus is in understating the facts and fitting those facts to the law, not keeping up with the latest technology."

When a trial lawyer walks into court, she wants to be fed the handful of documents that matter.  Spending countless hours sorting through a bunch of useless material or spending days learning technology is not a good use of a lawyer's time.  Sure, they have to use technology - printing reams of paper simply is not the way things are done today. However, most lawyers simply want to open a folder and browse relevant material - the hot documents.  Historically, we got lawyers what they needed by plowing through boxes of paper and making many copies of those documents that are important.  Many trees were sacrificed.  Today, there are smarter, better and faster ways utilizing technology as tools.  Utilization of those tools, however, is less about technology features and more about process and how that technology is applied in specific situations.  The feature gap between the various technologies is closing.  In a few years, most eDiscovery applications will not be distinguishable between another in terms of features and functions.  They will simply look and present information differently.  Most applications, despite what some would have you believe, use the same underlying algorithms and programming.  The notion that some artificial intelligence (AI) is going to emerge and replace lawyers is science fiction.  The practice of law is based upon precedent – history.  We have been here before with technology - many times.  This is not the first technology revolution in the legal industry.  There have been several.  That history is much more instructive than speculation and predictions about an uncertain future. 


The Copy Machine
There have been many predictions that photocopiers will eventually become obsolete as information workers continue to increase their digital document creation and distribution, and rely less on distributing actual pieces of paper. The first widely used copy machine for offices was invented by James Watt in 1779. It relied on physically transferring some of the (specially formulated) ink from an original letter or drawing to a moistened thin unsized sheet of paper by means of a press. The copy could then be read from the obverse side. The system was a commercial success and was in use in for over a century. The modern copy machine was introduced by Xerox in 1949 when it introduced the first xerographic copier called the Model A. Xerox became so successful that, in North America, photocopying came to be popularly known as "Xeroxing." Today, photocopy technology has gone digital and in addition to producing a paper copy of a document, most copiers today also create a digital copy of a document in various formats, the most common being PDF and TIFF. While digital technology has indeed changed things for the better, it is safe to say that it will be many decades before copy machines become obsolete, if ever. 

The Computer

The most dramatic impact on business as a whole, and certainly the legal profession, has been the personal computer. The PC first began hitting the office place in the mid-80’s, but were not commonplace until the late 80’s and in the case of the legal professionals, the early 90’s. As of June 2008, the number of personal computers in use worldwide hit one billion, while another billion is expected to be reached by 2014. I remember when computers began hitting legal professional’s desk. Most were intimidated, many were threatened and only a small minority of us embraced technology in those early years. It has taken more than two decades just to get lawyers to accept hosting and reviewing documents digitally. The PC has dramatically changed the practice of law in a very big way, but it has taken literally decades to creep into the main stream. It will take decades for lawyers to accept anything that remotely resembles AI. Sure, like everything else, there will be early adopters. Those risk takers that don’t mind getting burned by the buggy unproven technology that is not currently widely understood, nor supported. Technology will not replace a lawyer or her staff anytime soon. Ralph Losey and I share a dream:
“I dream that someday this may happen, in the far distant future. We lawyers can then be freed to focus solely on the law. We can use Artificial Intelligence agents and truth detectors to find the facts, to uncover the whole truth... But as a person who labors in the fields of law and technology every day, I can tell you that we are nowhere close to that day. Turning all discovery over to computers is a Utopian dream that is centuries away, in spite of the New York Times suggestion to the contrary.”



Voice Mail

About the same time as computers started hitting lawyer’s desks, voice mail began to be deployed. Most lawyers hated voice mail at first. Many initially refused to use it at all. They found it impersonal. Secretly, I think some hated vMail because it made them more accessible. For non-lawyers, voice mail was a god send. Gone were the days were we stood outside a lawyer’s office waiting for her to get off the phone. Gone were the days were you needed to leave a message with a secretary and hope that your message got top billing, or route a piece of paper to the lawyer through the office snail mail. The old timers will remember those routing slips and inboxes full of paper. Many still have those in-boxes full of paper, by the way, despite widespread adoption of digital technology and digital methods of routing. VMail therefore made communication more efficient. Did it displace humans? No, but what it did do is allow legal secretaries to spend more time doing actual work and less time answering phones, taking messages and tracking down lawyers.

Email

For better or worse, email has revolutionized the nature of business and personal communication around the world. Email has invaded every aspect of our lives. Email originated in the 1960’s, but in those early days was limited to use by primarily educational and governmental institutions. It was not until the mid-late 1990’s, however, before email was in wide use. Like every technology before it, lawyers were late adopters. In fact, some lawyers still do not use email directly today. It is safe to say that email has had a dramatic impact on how we communicate. Some of that impact has not been positive. Many predicted that email would have a dramatic adverse effect on many things, most notably the US Mail. Because people today send more email than they do letters, there has been a perception that there has been a marked decline in the amount of mail delivered. However, in 2009 the United States Postal Service (USPS) delivered over 177 billion pieces of mail with revenues approaching $70 Billion. It is estimated that in 2010, that figure exceeded 200 billion pieces of mail. While the USPS no doubt has had profitability issues and email is without a doubt having an impact on the USPS, it does not appear to be replacing the mail man anytime soon.

Conclusion

AI is decades and possibly centuries away from having any real impact on eDiscovery. Predictive coding, machine learning, or whatever buzz word of the day is chosen for emerging advanced technologies are indeed beginning to emerge and are proving to be very useful tools in the right hands with the right process. However, to claim that “Armies of lawyers are being replaced by cheaper software” is simply preposterous. As Ralph Losey points out, new and better ways of eliminating irrelevant information is in fact changing how we manage, review and produce documents. Technology is demanding that we become more efficient, which will in fact mean that we are beginning to review a smaller percentage of documents collected. A recent blog post observed – “Bottom line for lawyers is to embrace their inner “shark.” No, not the shark practice, predatory being, but the instinct to “keep moving or you will die.” Lawyers will need to understand technology, embrace it, and continuously update their knowledge to survive.” Becoming more efficient is something that has to happen anyway. Data volumes are going up, not down. In most cases there are far more documents available for review than can be reviewed, so a better way was/is enviable. Technology should not be viewed as a threat to the need for humans. As these technologies mature, the need for a sound human driven process is going to be of increasingly greater importance. Unfortunately, articles like the one in the NY Times do not help that cause and only serve to place technology in a negative light in an economy where lawyers are under increasingly greater pressure to reduce the costs of discovery.