Friday, June 22, 2012

TAR - Is Your Seed Sound?

Those that are using Technology Assisted Review (“TAR”) already know that the technology is sound.  As The Digital Advantage (TDA) has written before, it is not the technology we should question necessarily.  Rather, success relies upon how the technology is used and what one does with the results.  Most applications deploying TAR use sophisticated “find similar” algorithms to compare content patterns of one group of documents to another.   Using sampling techniques, small groups of documents are reviewed by subject matter experts and then the results are compared to the larger corpus by the technology.  The corpus is defined by the investigator, whoever that might be.  The technology then ranks each document in the corpus against the small expert reviewed sample.  Some have referred to this as a Google ranking of sorts. This small sample is generally referred to as a “Seed”.  There may be many seeds used in large projects.  Seeding, by the way, is a technique that has been used for just about as long as we have had search and review technology.  What’s different today?  The technology has gotten much better both in terms of sophistication as well as our ability to define and implement workflows.
So, the seed is critical right?  Bad seed, bad result.  And when we say the quality of the seed is important, we are not just talking about the quality of the seed review by the expert.  The make-up of the seed can make or break the result in terms of quality and/or time.  TAR technology and sampling methods are based upon random sampling a percentage of the target population of documents.  Traditional sampling methods assume a high relevance content within the overall population being investigated.  The lower the relevant content, the larger your sample size should be, technically.  For example, some sampling calculations assume that 50% of the sampled population is relevant.  A relevancy rate that high in our world is seldom the case.  Relevancy rates below 1% are common and rarely over 10%.  So, your random sample is a bit like a shot gun blast in the dark. 
Does that mean that we should have less confidence in TAR and sampling mythologies? No, it most certainly does not in our view.  Rather, doesn’t it make sense to create a better seed and increase the accuracy?  By the way, even a bad seed is superior to the common practice of throwing terms at the wall and seeing what sticks, but we digress.
As TDA has opined many times, search terms are the only truly transportable objective content filters that the parties can agree upon.  Using seeds constructed and reviewed from validated search terms increases dramatically the impact of the seeds and the success of any TAR workflow.  Far fewer documents will be reviewed.  Do you have sound auditing methodologies?  Are you just throwing technology at the problem, or are you using tried and true workflows? 

Monday, June 4, 2012

Deleted Email. Are you looking in the right place?

Those of us who live in the forensic world know that when you delete an email, it is not really deleted. Not right away. Most, however, don’t know where to look and may be looking in the wrong places.

Let’s start with where one won’t find deleted email. Individual deleted emails usually don’t reside the computer’s hard drive as individual emails. Not as single files. So, if you are conducting a forensic recovery of a hard drive and are looking for individual emails, you might be looking in the wrong place. The operating system keeps track of where files are placed on the hard drive. In the case of email, the hard drive MAY contain a single file containing many emails. Where to look is largely going to depend upon what email system is being used and how that system is configured. For the purposes of this discussion, we will assume the email platform is Microsoft Outlook. In this case, we will use the example of a laptop computer that is configured to manage email within an OST file. An Outlook Storage Table (.ost) file is an offline folder file utilized by Microsoft Outlook. Offline folders make it possible for the user to work offline and then to synchronize changes with the Exchange server the next time they connect.

Another type of file in Outlook is the PST file, which is simply stored on the client or a server other than the Exchange server. A Personal Storage Table, better known as the “PST” (.pst), is an archive file that Microsoft Outlook users use to manage email messages, calendar items and other things normally managed by Outlook. A PST file is usually stored on a user’s hard drive, or on a network share, as opposed to archived email managed by Microsoft Exchange. A PST email archive is usually created by individual users to store email outside of the corporate email environment and circumvent server storage quotas. What is the difference between an OST and PST file? An OST file starts as a mirror image of a user’s folders on the Exchange Server, and works in conjunction with the Exchange Server during synchronization. A set of PST files, on the other hand, is simply a storage location on the hard disk or a server other than the Exchange Server. When a user works offline, that user can add, delete, and modify the contents of an OST file exactly as that user would with a folder on a server. For example, a user can change and move items in the offline Inbox, send messages that are placed in the offline Outbox, and read offline public folders. In the meantime, information on the server is still processing. The user continues to receive new messages in the mailbox while other users can add, delete, and change items in public folders. However, the user is not aware of these changes on the server until a connection to the network is reestablished. It should be noted that while both OST and PST files are commonly referred to as "container" files, they are actually tables containing database entries (email text and metadata) and objects (attachments).

When a user is managing email using an OST or PST and deletes an email, the pointer to that email (and any attachments) are first moved to the “deleted items” folder within the tables. The pointer and the email will remain in that folder until the user “dumps” the deleted file folder. This act simply removes the pointer to that email from the “deleted items” folder within that PST or OST file. The pointer to the table entry containing the information relating to the email is removed and not the information itself (the text, metadata and associated objects). The email and the attachment (objects) will remain within the “slack space” of that OST or PST file until the user “compacts” that file, which will physically remove the information for those records within the tables. Most users don’t know how to perform a compact, so it is likely that any and all deleted email will remain in that file until the entire container file is deleted. There is no individual email file managed outside that OST or PST, so an attempt to recover from the hard drive of the device you are examining will not recover individual emails directly. Rather, what one would recover is, for example, an entire OST or PST file that a user has deleted. It is important to look for and recover entire OST and PST files from the hard drive, by the way, so don’t skip that step. If you do, you will be missing potentially important information.

Recovering email from the slack space of an OST or a PST recovers those orphans and is a step that is routinely missed by most in this business. Most tools and processes don’t look in the right place. Are you missing important evidence by not looking in the right place?

Sunday, May 13, 2012

Are your samples Random? Are you just getting random results?



Introduction


There has been much discussion of late about a variety of "emerging" technologies.  Predictive coding, err um, technology assisted review (TAR), statistical random sampling (SRS), what some algorithm does or does not do, complete with panels of experts to explain this that or the other. Entire days of testimony are being devoted to peeking behind the curtain of “predictive coding” algorithms. The Digital Advantage continues to ask, why all the fuss? Shouldn't we be focused on the merits some say?  Why all this math?  After all, most lawyers will tell you they are lawyers today because of the supposed lack of math, but we digress. 

It’s Just Math


Ralph Losey in his most recent treatise on random sampling is quite the read.   Ralph predicts (trumpets sound)…

“….in the year 2022 a random sample polling of American lawyers will show that 20% of the lawyers in fact use random sampling in their legal practice. I make this prediction with an 95% confidence interval and an error rate of only 2%. I even predict how the growth will develop in a year by year basis, although my confidence in this detail is lower.” (Random Sample Calculations And My Prediction That 300,000 Lawyers Will Be Using Random Sampling By 2022)
Ralph’s prediction, of course, is tongue-in-cheek. Well, sort of. The math behind sampling is serious. When you consider that unlike the broader world where statistical models are intended in studies or polls, in the document world the all important base line changes from one matter to the next. Put another way, statistical modeling is only as good as the information you feed into it and any assumptions that you apply. By way of example, Ralph’s conclusion that 300,000 lawyers will utilize random sampling by 2022 may be flawed.

Assuming that by the year 2022 there are 1.5 Million lawyers (the ABA estimated there were 1,128,729 resident, active lawyers in 2006), I predict that 300,000 lawyers in the U.S. will be using random sampling by 2022. The confidence interval of 2% by which I qualified my prediction means that the range will be between 18% and 22%, which means between 270,000 lawyers and 330,000 lawyers. I have a 95% level of confidence in my prediction, which means there is a 5% chance I could be way wrong, that there could be fewer than 270,000 using random sampling, or more than 330,000. (Id.)
  
Flawed you say? The equation for determining sample size based upon known and desired factors such as population size; tolerable margin of error and other factors are well settled. No, the math is not questionable. It works. We have seen it over and over.  Trust those math folks.  Be careful with your inputs.


In terms of the numbers you selected above, the sample size n and margin of error E are given by
x = Z(c/100)2r(100-r)
n = N x/((N-1)E2 + x)
E = Sqrt[(N - n)x/n(N-1)]
where N is the population size, r is the fraction of responses that you are interested in, and Z(c/100) is the critical value for the confidence level c.
Rather, the math only works if your inputs and assumptions are sound. Here, not all 1.5 million lawyers Ralph assumes will exist in 2022 are litigators. We don’t expect that real-estate and tax lawyers will be utilizing random sampling related to document review. Those contract lawyers are unlikely to be interested in sampling as well. So, the population Ralph starts with may be far less than 1.5 million. Although, we have not audited Ralph’s results and that is not really the point. Here, the result would be sampling more than you need, which is not a bad thing necessarily. The result would be better. But, that simply re-enforces Ralph’s over arching point and ours here – it is not precise statistics that are important. Statistical sampling is a tool among many other reinforcing tools. You don’t have to be a Ralph Losey type lawyer and gain an understanding of statistical sampling (the underlying math), or hire an expert to explain it to a judge or jury. Sample size is important so that you are gathering enough inputs and that those inputs carry the least amount of risk you are going to miss important information. The process used should measure objective information. Results should be validated and audited, so getting a precise sample size is not as important as using some rule of thumb that is repeatable. Statistical sampling is simply a method by with you are organizing documents upon which to gather what to tell the machine. When you consider that less than 1% of all documents that have any value at trial, reviewing everything simply is not possible, nor necessary in virtually all cases.

“I saw one analysis that concluded that .0074% of the documents produced actually made their way onto the trial exhibit list-less than one document in ten thousand. And for all the thousands of appeals I’ve evaluated, email appears more rarely as relevant evidence.” DCG Sys., Inc. v. Checkpoint Techs, LLC, 2011 WL 5244356 at *1 (N.D. Cal. Nov. 2, 2011) (quoting Chief Judge Rader)

Follow a Simple Process

Unlike the use cases for which random sampling models were built, in the document review and production world, we are not shooting in the dark. While it is true that in almost all cases the relevant material is very small in proportion to the amount of material available for analysis, we have a pretty good idea what words and phrases appear in relevant documents, providing at least a start. The subject matter is known. Filter parameters can be narrowed by date, authors and recipients and any number of other known factors. In the old days – those days before technology – we just knew where to look. Today is no different except that we now have technology to help us. Technology helps us increase the odds in our favor. Audits will identify new terms, phrases and concepts for the technology to use to find new sources. Sampling is not so random.

It is becoming common place to agree upon and use search terms, often without any testing or validation of those terms what-so-ever. Wouldn’t it be important to know for certain, say with a 2% chance of error, that term you chose would return relevant documents? Don’t you want to know what percentage of your effort will be wasted if you review all documents hitting a specific term? Why not take a “statistical” sample of all documents hitting that term and measure the relevancy rate? You don’t need to prove what’s statistically appropriate, there are ample calculators that will “do the math” for you. The math has been proven. See sample size calculator by RAOsoft.com and Ralph Losey’s “worst case scenario”  sample predictor. Using statistical sampling calculators inside a well reasoned process to, as an example, test the validity and recall rates of terms that are being contemplated is not something that should have to be defended. You are simply using a calculator to help you find the best possible samples upon which to measure a term(s) effectiveness. Ultimately, it is the term (along with other objective filter parameters) that are agreed upon, not what constitutes a sound statistical sample. In other words, the result and the confidence in that result, not necessarily how that sausage was made. Humans, not machines, are deciding what terms, topics and concepts are relevant. The technology simply finds documents with content similar to that which a human has decided as relevant. That’s why some call this emerging technology “machine learning”.

Today, agreeing upon a set of term or phrases remains the only reliable objective filter that can be agreed upon and easily transferred from one technology to the next. Terms that are validated utilizing a repeatable and quantifiable methodology is going to make it much easier to defend choice of terms. And oh by the way, these are not things about which we are guessing. Don’t guess, get help.



Friday, April 27, 2012

“Vendors – A Dying Breed? Are we Pricing Ourselves Out of Business?

Introduction

The Digital Advantage has been busy, hence the gap in posts. This week’s post, therefore, will highlight two recent posts/articles from eDiscovery Journal . While we would like to see them drop the “e”, “Discovery Journal” is always a good read, BTW.


Vendors – A Dying Breed? Thoughts from IPRO Innovations

The first post “Vendors – A Dying Breed? Thoughts from IPRO Innovations”, by Greg Buckles, has some interesting observations from the recent IPRO conference just down the street (literally) from The Digital Advantage. 
“One interesting statistic from Jim King’s (IPRO CEO) keynote was a 17% decline in the number of IPRO service providers while the actual revenue they generated was up 12%. That’s right, the IPRO channel is shrinking (just like the eDiscovery market) but the volume is still growing. This resonated with the panelist perspectives on the increasing need to find alternatives to volume pricing ($/GB).”
A 12% increase in revenue, assuming year over year, is a healthy increase, especially in light of the economy, climate and commodity pricing. With regard to the 17% decrease in the size of the IPRO channel may well support the notion that the market is shrinking. That is certainly one interpretation. Or, might it be IPRO channel partners moving to other products? Could it be products like Viewpoint by Lateral data, Digital Reef and other processing/conversion engines are supplanting IPRO, LAW PreDiscovery and other more traditional simplistic processing engines? Could it be that there is finally a lot less scanning of paper in the world and that part of IPRO’s market is shrinking with those “vendors” shutting down those parts of their business, or in some cases going out of business? The point here is not to beat up on our friends down the street from us at IPRO. They are indeed a fine company with smart people and fine products. It is clear from IPRO’s product road map that they recognize the shift from the need for simple “vendors” and the need for feature rich filtering and reporting functionality. The commodity of “processing” data can be done by anyone on any street corner. You can find a hot dog on any street corner in NYC too. What you do with the hot dog, however, is a different matter all together. Some hot dog stands are better than others.

So, what do we need to do about this perception of high technology costs? Greg hits the nail right on the head:
"Providers need to differentiate their offerings with standardized processes, project expertise, transparent invoicing and metric-rich reporting to survive the transformation from commodity vendor to managed service provider."
In our business, what separates a “vendor” from a “service provider” or “trusted advisor” is what you do with that commodity product (processed data). While tool and technology are important – you need the right feature sets, reporting and reliability – how you use that technology is vital. Training, work flow, project management, accountability and the long list of other things one needs when dealing with vast amounts of information all have associated costs. Supporting the technology and making sure that technology is available at all times has a cost and requires expertise. Those services can’t be expected to be provided for free. The churn and burn mentality associated with the commodity of processing data is dangerous in the wrong hands. It is indeed highly competitive market within which we find ourselves and the competition is not shrinking. Although, as has been pointed out, some portions of the market are indeed consolidating. But for every consolidation, someone new seems to come along and on occasion, bringing something new to the table.

In other words, it is more than just about price, which brings us to Mikki’s post.

Are we Pricing Ourselves Out of Business?

As Mikki so astutely observes, the business of collection, processing, filtering, review and producing ESI is not just about technology price. “This is not a ‘push this button and you are done’ environment.” Mistakes can be costly. Entire careers have been flushed down the toilet and reputations lost forever because of a mistake or neglect during the ESI process. Providers or law firms that think they can provide a service for, using Mikki’s example, $75 per GB should raise a significant red flag. In an exchange with a “Vendor Veteran”:

“[H}ow can other vendors possibly compete with that and not sacrifice profit or quality”? My response? They probably can’t. But read the fine print before you panic. As discussed in the many blogs about pricing, the models can be very confusing and as a result it is difficult to compare apples to apples.” 
Either there are hidden costs that won’t appear until invoice time, or the service being provided is, well, a service you are buying from a “vendor”. Managing ESI is indeed about the cost of people, process and technology. The biggest mistake made today is to get so caught up in technology cost, because that cost is seen first. This in many cases drives a decision to choose the wrong technology, process or people. As Greg Buckles previously observed, we must move away from the GB consumption model. Technology cost should be flat and more in line the broader software licensing market. Well designed managed service models that flatten out the cost and separate the three components Mikki outlines – People, Process and Technology, are emerging and will continue to mature. Technology has in many ways reduced the labor costs at all levels. Processing technology is mature and higher volumes can be processed with fewer people. So, that portion of the labor cost has indeed gone down. We are reviewing fewer documents. Anyone reviewing 100% of everything collected, well, someone had better take a harder look at that process. So, review cost in a sense has gone down. However, as Mikki observes:
“However, the labor expertise required to handle ESI has not decreased. If anything, it has increased. So, while the amount charged by a service provider goes down, the cost of quality employees does not.”
Yet, we continue to hear mostly about the cost of that “vendor”. When, in reality, the largest cost, hands down, is not the cost of the technology, or the labor cost associated with processing, hosting or supporting data within an application. The largest single cost in litigation is not the cost of converting a word document to TIFF, branding the TIFF with a “bates” number, creating a load file, placing all that on a disk then shipping off to someone across the country. No, the largest cost during discovery is not even hosting millions of useless records for years because the right decisions were not made to filter out everything that is not absolutely needed to prosecute or defend a specific set of facts. Rather, the largest single costs associated with litigation today are legal fees within largely inefficient filtering and review methodologies. The act of taking 50, 100 or a thousand search terms and throwing them against millions of files and then reviewing the result and calling that an efficient process is at the root of what is driving up cost. Most of the cost of conducting discovery is in managing and using document evidence. Very few cases go to trial and very few documents are actually used during discovery or trial. Far less than 1% of the documents produced during discovery have any value to anyone, except maybe the bottom line for everyone except the ultimate client – the one paying all the bills. So, shouldn’t we be asking the hard questions around the document review part of the cost rather than focusing just on the commodity processing pricing? Shouldn’t it also be a time when the 35 document decision per hour review rate be a thing of the past? Isn’t it high time for us to stop reviewing millions of useless documents when only a few are needed? Greg and Mikki are correct; the days of the $1,000 GB processing charge are long gone.

So, maybe now we can focus on the people and process part of the equation and separate that cost from the cost of technology. Maybe, just maybe, it is time for us to turn our ear back to that “Trusted Advisor” that professional “service provider” that spends every day in the trenches focused on the entire project ROI and not just the cost of technology. After all, like hot dogs in NYC, you can buy technology from any “vendor”.

Friday, February 10, 2012

Technology Assisted Review (“TAR”) – We have been expecting you!

As predicted in The Digital Advantage January post, LegalTech – Technology Assisted Review will be the theme, the buzz at LegalTech was indeed TAR.  Some will know the technology features as “Predictive Coding”, “Automated Review” or any number of other terms being used to explain what is at its core the same basic technology, despite what some would have you believe.  A recent Forbes article really presents the current problem with what should otherwise be quick adoption of this technology.  What Technology-Assisted Electronic Discovery Teaches Us About The Role Of Humans In Technology.

Some continue to think that courts are going to somehow sign off on “Predictive Coding”. 
Given the significant benefits that technology-assisted review can bring to e-discovery from the dual perspectives of quality and cost, expert commentators have asked a key question: Why isn’t everyone using it? Of the 11 e-discovery vendors surveyed by Kershaw & Howie, “[t]he most mentioned reason cited by respondents was uncertainty or fear about whether judges will accept predictive coding.” Kershaw & Howie, Crash or Soar (supra).
For the love of Pete, please stop thinking of this technology as something that the courts are going to “sign-off” on!! They will not.  Judge Peck said just that in the aforementioned Forbes article.   
“Until there is a judicial opinion approving (or even critiquing) the use of predictive coding, counsel will just have to rely on this article as a sign of judicial approval. In my opinion, computer-assisted coding should be used in those cases where it will help “secure the just, speedy, and inexpensive” determination of cases in our e-discovery world.”
The courts have for years encouraged the parties to cooperate and come to agreement on the filtering methods.  This attitude of expected cooperation on the discovery process is nothing new.  It is not something that the technologist dreamed up.  When you read the rules of any state or federal court, you will find agreement and cooperation mentioned frequently.  While most “predictive coding” applications are built upon the same basic statistical algorithms and linguistic pattern detection, they don’t all always achieve the same result.  This is because that how you conduct the analysis – the workflow is critical to the end result.  TAR is not the silver bullet alone.  Lawyers SHOULD NOT have to defend the technology – how the algorithms work, or what the technology does or does not do. Instead, we should focus on the workflow and the end result.  The technology being utilized should be transparent.  The workflow and end result should be based upon what a human decides is relevant.  For us at The Digital Advantage, there are really only two types of filters that need be explained. 
Objective Filters:  Objective filters are essentially filters that can be applied without any “subjective” reasoning.  File type and date range are two common examples (not an exhaustive list) of filters applied in most cases. These filters should not be considered work product.  Objective filters are often disclosed and agreed upon between the parties. 

 Subjective Filters:  Subjective judgment is applied to document decisions in most cases.  When a lawyer reviews a document, the “mental impressions” that lawyer makes is generally considered work product.  In the old days, we did not disclose how many “boxes” we reviewed to arrive at a production that may only be a few documents, or a few boxes out of hundreds.  We did not disclose what file cabinets were opened, or who was interviewed.  Then as is the case today, that production would occur in one of two forms (generally speaking).  Documents are either produced in their “usual course of business”, or “by request.”  In the “old days” we avoided producing “by request” as if it were the plague.  Search terms today are often used, and should often be disclosed as an “objective” filter.  A term is indeed an objective filter, but you may arrive at the use of a term using subjective factors.  You may even use “predictive” technology that, in its simplest form, looks for common words, concepts and patterns.  Either a document contains that word (or some variation) or it does not.  How you arrived at the term, however, may well be subjective and protected work product (some lawyer will make that call).  After all, attorneys are officers of the court and when responses are provided, the attorney for each party is representing that a diligent search has been conducted and that there has been compliance with the requests, or the appropriate objections have been lodged.   
As Jim Wagner recently wrote in his blog: So, welcome to 2012, predictive coding. We’ve been expecting you.”  The Digital Advantage adds this warning: Don’t expect TAR to be a silver bullet. Without the right workflow, checks, balances and quality control, technology like this in the wrong hands can yield undesired results. So, if you are waiting on this technology to become “defensible” or some court to sign off on this technology or that, you will have a long wait. Meanwhile, the world is going to pass you by. If your practice is based upon a lot of document review hours, you may wake up one day and not recognize the world you live in. TAR is here and it is not going anywhere. There is plenty of help available for you.

Wednesday, February 1, 2012

The "Cloud" is NOT new

In the old days, those days before we all had a PC on our desk, there were "mainframes" and "dumb terminals".   Technology was delivered from a central location where all data was kept and delivered to what were essentially monitors and keyboards.  Data was delivered from a “Cloud”.  Then came the PC and computer power and storage were distributed and the "main frame" started to go the way of the dinosaur.  Well, that dinosaur is back it appears.  Some call it the "Cloud" as if the concept is new and novel. It is not.  Even in the modern day of the PC, much of the data we access today is in fact somewhere other than on our PC.  In some cases, we access data sitting on a server in another room, another floor, across the state, across the country or even on the other side of the world.  To us, the average user, we often don't really know or in fact care where our data originates, as long as it is safe and available.  Yes, sales folk, CIO's and those technical types want us to think that "Cloud" is something magical, new and different, when it’s not.  That is not to say that the concept of keeping data in a "Cloud" is not important.  For a law firm, as an example, not having to deal with managing servers, software and all the associated infrastructure to simply access documents is frankly in most cases much more cost effective when managed by a service provider who already has everything they need in terms of technology.  Most service providers in this business have been delivering "Cloud" technology for years, so let’s not keep thinking of this as a new thing. The "Cloud" is old; we are just using it in a different way.  Who knows, we may well return to the days of dumb terminals and mainframes. Wait, we are already there, but we call them "Smart".  And, there is this thing called the iCloud where something called an iPad accesses information in the "Cloud".  We have indeed come full circle it appears.

Friday, January 20, 2012

LegalTech – Technology Assisted Review will be the theme.

As LegalTech approaches, our software friends madly scramble to throw up their ground shaking new features.  LegalTech is the platform most in this space use to launch new products and features. Legal service providers and software company employees in the space may out number law firm and corporate attendees, but corporations and law firms not in attendance this year will miss what some believe, including The Digital Advantage, will be a year when some truly amazing technology advances are achieved and showcased. 
This year several companies are announcing some truly ground shaking technology that, if used properly, will change how ESI is managed, filtered, reviewed and produced.  This year, the overwhelming theme at LegalTech will be the announcement of new "technology assisted review" features to some products where those features where not previously present.

The Digital Advantage has been following and writing about “technology assisted review” for some time.[1]Technology assisted review is not new.  Recommind Inc, Orcatec, Equivio and others have had similar features on the market for some time.  The Symantec and Lateral Data products highlighted below, however, in The Digital Advantage's opinion, are taking a new and fresh approach that will allow legal professionals the ability to use and rely on this technology, without having to defend a single technology by retaining a PhD to explain how everything works.   Both applications contain workflow driven features where lawyers only need defend their workflow and review methodology rather than what a technology does, or does not do.   The two applications highlighted below are announcing these new features to their products just before and during LegalTech.   The list of providers releasing technology assisted review features is a long one, and this is by no means an exhaustive list of providers.  The discussion is simply those with which The Digital Advantage has firsthand knowledge. 

Symantec Announces “Transparent Predictive Coding”

On January 23, Symantec will officially announce an upcoming feature set for the Review & Production Module of the Clearwell eDiscovery Platform that The Digital Advantage agrees that when used in conjunction with a proven and well planned workflow and methodology, will streamline legal review significantly. This new functionality, Symantec calls “Transparent Predictive Coding”, will deliver the ability to create a flexible and intuitive workflow to help leverage technology during human driven review. Clearwell eDiscovery Platform works in much the same way as the “predictive” technology platforms that came before it and can automatically predict document relevance in order to simplify and streamline document review. The difference here, however, is that Clearwell has clearly learned from those early adopters and provided flexibility in how this technology can be used. When coupled with the right methodology, the net result will be a more defensible review process at significantly reduced cost.

Clearwell's Transparent predictive coding highlights the following specific features:
  • Smart Sampling – The ability to target specific documents sets
  • Smart Tagging – Automating pre-classified criteria that has been tested
  • Prediction Insight – Greater transparency into what is being “predicted”
  • Prediction Workflow Management – Flexibility
  • Prediction Analytics – Strong reporting and transparency
  • Prediction Templates – Ability to automate and repeat what works
  • Review Quality Control – Customizable QC
Viewpoint by Lateral Data announces “Assisted Review”

Lateral Data’s Viewpoint software is an all-inclusive e-Discovery platform.  They too are announcing at LegalTech a version of Predictive Coding.  Viewpoint “Assisted Review” applies advanced technology also designed to help review teams reduce costs by reviewing fewer documents in less time. Viewpoint’s Assisted Review is completely integrated into the Viewpoint platform and is not a “tacked-on” product providing separate installation, training and complexity. There is no need for complex, time-consuming imports, exports or duplicative files transfers between disparate software products to get advanced “assisted review” analytics.  This will save considerable time over other products with similar features. When coupled with a proven workflow and methodology, Viewpoint’s “assisted review” features should prove to be a considerable upgrade to an already strong product.  Here are some of the Key Benefits:
  • No additional cost - Viewpoint Assisted Review is included with every Viewpoint Enterprise Subscription license. 
  • Complete integration - Not a “tacked-on” third-party tool. No additional training or installation required.
  • Strong Analytics - Leverages Viewpoint’s already proven built-in analytical tools to build efficient sample sets for predictive analysis.
  • Advanced Filtering - Viewpoint Assisted Review can be added to traditional culling techniques (e.g. date range or keyword filtering) to ensure the lowest possible quantity of documents are reviewed, with the highest possible relevancy rate.
  • Early Data Assessment (EDA) - The most highly relevant documents in a dataset can be easily isolated. 
  • Quality Control - Once a standard first level review has been completed, users can compare results to easily find and correct discrepancies.
Conclusion

Technological innovation runs in cycles and indeed technology assisted review may be an up cycle.   Just because something is bright and shining does not necessarily mean it’s a good thing.  Don’t get sucked into the mindset that technology solves all problems by itself, or you will be sadly disappointed.  Technology is only as good as the designers who create it and those who use it.  The “rubber meets the road” at the intersection of Human and Machine.  Not to be confused with human vs. machine, despite what some would have us think.  Technology assisted review is not going to replace lawyers anytime soon, although fewer numbers will hopefully be needed to do more with less.  But, that was inevitable wasn’t it?  If you are going to invest in Technology, compliment it with a sound implementation plan, integrate sound technology with a proven process and have smart people marrying it all together.    This is how you can maximize your technological investment and benefit from some truly useful technology in the right hands. 



[1] See "Predictive Coding: Angel or Demon"  The Digital Advantage, March 2011.