Friday, November 3, 2017


Relativity Fest 2017 Review – Part II
Rise of the Machine

Part I was of the R Fest 2017 review – "The Popsicle Moment" - can be found here.

Introduction

History in this industry teaches us that we can’t place all our eggs in a single basket.  As noted below, been there and still have the T-Shirt to remind me.  After all, iControl ESI has been stepping out on our own technology for years.  We are agnostic.  I didn’t attend R Fest simply because we have clients that love Relativity.  No, I attended primarily because of history.  I personally made a big mistake in 2008, as did many.  I try not to make the same mistake twice.



My History without Relativity – When you can’t beat them, join them

Rewind to August 25, 2008, the 31st annual ILTA conference at the Gaylord, Texan near Dallas, Texas.  I was at FCS at the time and a crew of us were attending ILTA.  FCS, as some might recall, was the 2nd  Attenex partner when Attenex built out their channel.  In fact, FCS was an award winning Attenex partner.  Yet, we went to ILTA that year looking to replace Attenex.  Why would an award winning Attenex provider be seeking a replacement?  Well, in June of that year, FTI announced that it was acquiring Attenex!  Despite receiving calls and assurances that “nothing will change” at Attenex, we knew better.  FTI had acquired Ringtail in 2005 and, of course, said nothing will change then.  Well, as we all know, a lot did change and those of us at FCS knew there would be changes at Attenex that would have a dramatic negative impact on our business.  History did in fact repeat itself and most of the Attenex crew left, including Joe White who joined FCS as president in June of the following year (2009).

So, in 2008, the FCS team was on the hunt for new technology.  We encountered these folks from “kCura” pitching this product Relativity.  By this time, we had heard of Relativity because they had been making some noise, but Relativity wasn’t as mainstream as it is today.  Relativity was unproven in 2008.  We went to this little suite at the Gay Lord and sat down with Andy Sieja and others.  I can’t remember all the players from that side, but it was a crowded suite.  Andy made the pitch.  BTW, Andy, if your reading this, your sales skills have improved dramatically since that Gay Lord presentation, but I digress.  We listened, and I remember walking out of that suite thinking….hmmm…these guys might be on to something.  They were talking all about openness, integration, cloud and a lot of things that have since transpired.  Yet, if we went with Relativity, we would be making a bet with folks that we didn’t really know.  They didn’t seem to have a lot of capital, but we really didn’t know what was going on with them as a company.  Flash in the pan?  We had seen so many others with great product ideas come and go.

At the time, we were talking to Lateral Data – makers of Viewpoint.  We knew the folks at Lateral Data, some of which were with me at RLS a few years before.  Viewpoint was also new, but we had a client using it because one of the firm partners had a relationship with the Lateral Data owner.  One of my close friends at the time was there.  It felt comfortable.  We were sold on the fact that with Lateral Data, we would get a lot of voice into the road-map, and we did.  We even did some dev work to help.  Since we had a strong dev team at that time, we saw Viewpoint as an opportunity to help make an existing product better.  FCS became Lateral Data’s first channel partner.  Life was good, but it didn’t last.  We made a BIG mistake choosing Viewpoint over Relativity, as history would teach us.

You see, we didn’t expect what would happen next to Lateral Data.  Lateral Data sold to Xerox in 2012.  Of course, Xerox said nothing would change.  But, dramatic change ensued.  Xerox yet again fumbled the future.  Yes, some may see that last sentence and recognize a book: Fumbling the Future: How Xerox Invented, then Ignored, the First Personal Computer.  That’s right, I bet many of you did not know that not only did Xerox invent the first personal computer, they also invented, and mass produced, the first lap top. The battery got so hot, they didn’t call it a lap top because it would burn your leg.  It was called a “battery powered PC”.  I know because I worked on the resulting litigation when Xerox gave up and cancelled the manufacturing contract with Sunrise Systems.  But, again I digress.  Bottom line, we at FCS bet on the wrong horse when we bet on Viewpoint.

As previously mentioned, it has only been the last two years where I am somewhere that offers Relativity.  Prior to that time, I’ve spent this entire time essentially selling against Relativity.


Place Multiple Bets

So, how does one learn from and avoid those prior mistakes?  Well, you spread your risk.  You bet on multiple horses and hedge your bets.  You make sure one of your horses allows you to add your own secret sauce.  iControl ESI, where I am now, offers several technology options, including our own technology, Recenseo, that we’ve had for well over a decade.  We also offer IPRO’s Eclipse as well as Relativity.  Our advanced technology, Envize, integrates with Relativity, and hopefully one day soon, Eclipse.  Neither Relativity nor IPRO can or will devote as much time as we do in the machine learning area, and why should they when companies like iControl are years ahead of them?  Products like Relativity and Eclipse are simply platforms that do a lot of the things that the “mainstream” wants and needs.  For those that are doing more advanced work, we don’t really need to focus on the “mainstream” basic functions of a review tool, because we can integrate into the tools that customers want, including our own review tool.  But, we are going to date multiple partners because our client’s tastes vary.
Relativity continued to push the open platform message throughout the conference.

One of our biggest fears about Relativity is that, because of Relativity Assisted Review, Relativity would become closed.  Turns out, that is not their vision now, and never has been.  Relativity will remain open to those of us who want to use our own secret sauce.  API’s for Relativity are very mature and improving with each new release (saw the road-map).

Oh, there was another concern also resolved at the conference.  There have been “rumors” that Relativity is looking to sell to Microsoft.  Andy shot that notion down.  Not happening anytime soon, he says.


Analytics

Conference speakers did a great job of finally explaining the difference between the broader “Analytics” vs. more specific “Predictive Analytics”.   When most think about analytics, they think of basically three things, which all leverage a form of Machine Learning.
  • Email Threading and visualization
  • Near Dupe Identification and visualization
  • Concepts clustering and visualization
Machine Learning base technology that works best for those three forms of “analytics” does not necessarily work best for “Predictive Analytics” which is closer to actual “Artificial Intelligence” than the technology used for the above three “analytic” features.  Those older methods simply require too many computing resources to support true machine learning where humans teach the machine and the machine learns, real-time, as the review progresses.  Grossman/Cormack call this “Continuous Active Learning” (CAL).


Rise of the Machine

My focus at R Fest this year was advanced analytics.  It’s the place I live and breathe every day.  There is a great deal of new interest in advanced analytics.  Those of us in the software world need to know what direction the market is headed to confirm we are headed in the right direction, both as a software and services company.  I do not want to bet on the wrong horse again.  Machine Learning was indeed a major topic at R Fest with panels of judges, scientist, legal professionals and integrator in attendance, talking about advanced technology.  If I had to pick two R Fest themes, I’d say they were “Integration” and “security”. 

I attended several sessions that discussed the “machine learning” road-map of several products, not just Relativity.  A major topic, and floor discussion among those of us focused on the rise of the machine, was around who’s secret sauce is the right direction.  You see, there has been for years a raging debate between the various scientist around which “algorithm” or “method” works best.  The three dominant camps are:

  • Latent Semantic Indexing (LSI) camp, focused on “visualization” of concepts, and not as much on the “Predictive Analytics” capabilities of machine learning.
  • “Logistic Regression” camp that believes their methods (and custom algorithms) are more “accurate”.
  • Support Vector Machine (SVM).  The dominant method (but not the only) being adopted by most of the integrators at R Fest. 

SVM vs. The World


Relativity announced at R Fest 2017, that they too have chosen the SVM camp and did a very good job of explaining why. 
The “accuracy” difference between the various methods in our world is, well, not material.  Although, the Yang et al 2017 paper – Effectiveness Results for Popular e-Discovery Algorithms - seem to suggest that SVM performs better than other methods in many areas, see Tables 1, 2 and 3 in that study.
Again, SVM is the dominant base method that most at R Fest are using.  Other methods, such as Logistic Regression, were in the minority at R Fest.  The SVM selection is validated by our study - Batch-Mode Active Learning for Technology-Assisted Review* that was accepted by IEEE in 2015.  Despite all this external validation of our chosen method, I wanted to make sure we sought some validation from other integrators and not just the academics.  I personally am interested in workflow and the “does it work” test preferring to let the real scientist duke it out around the back-end science.  That’s not to say that I don’t engage in debates with those PhD types.  Those that know me, know that I enjoy nothing more than a good debate about machine learning.  I frustrate the scientist, however, because I say things like – too many clicks.  Too much effort to build classifiers.  We must review too many documents with your method.  Lawyers must learn math, and wait too long for the leavers to get pulled.  “Looks great” but too much flash, bells and whistles we don’t need, and not enough of those things that work practically...and other such “logical” arguments.  This kind of discussion often makes the technical types uncomfortable, because they usually have never actually worked the business end of litigation. What works in the lab doesn’t always work in reality, I say. 

If the academics want to spend their days splitting hairs around precision and recall, I say knock yourselves out, but the “math” is never going to work.  Nobody wants to review 20% of anything so the sampling math works on low prevalence populations, which is our world and not the laboratory.  My job is to take what the scientist creates in the lab and see if it works in the field, so to speak.

Accuracy is not the primary reason that a SVM based algorithm(s) is the better choice.  Flexibility and performance drove our choice.  Like some other methods, an SVM based method requires fewer computing resources.  Classification and rescoring can be done much faster, making a continuous review workflow possible.  Also, with SVM, you don’t have to spend a great deal of time manually creating “classifiers” and filtering out noise.  In the machine learning environment, SVM based methods are very good at filtering out the “false positives”.  “Stabilization”, that point where the machine learning is stabilized, and the machine isn’t getting any smarter, is achieved much faster than some other methods.  In other words, you review fewer training documents.  With SVM, you arrive at the same train station as other methods, you just get there earlier and with less effort.


Conclusion

As Andy Sieja likes to say, we must all figure out how we can work together.  Machine Learning in the legal space has arrived.  The benefits can no longer be ignored by the mainstream.  I know, I know, I and others have been saying that for years.  But, to be fair, the industry has spent too many years talking about the underlying science, which has been around for decades, and not enough time talking about how to use the machine in a familiar workflow.  Federal and State court judges have for years said, “you don’t need our blessing” to use advanced analytics.  Most lawyers just want to review relevant documents without a bunch of waste.  Most don’t want to spend time with charts and graphs, math and science, bells and whistles.  Those that do, well, there’s plenty of that in the eDiscovery software world. 

So, if you’re not using advanced analytics and are waiting on something else to happen, I’d sure like to know what exactly is keeping you from leveraging the machine.  I’d love to hear your objections and be given an opportunity to address all concerns.  Trust me, your unlikely to come up with a new reason not already addressed over the past decade, so we can certainly help address pretty much any concern.  If not me, contact somebody expert in this area.  IF you don’t, you are going to wake up one day in a world you don’t recognize.  

No comments: