Thursday, November 4, 2010

Database Discovery - What's all the fuss?

Today I read another great article in LTN -
Database Discovery Is Dubious, but Unavoidable. As I read the article and quotes from our friends over at FTI Consulting, the technologist in me could relate. "There are still types of evidence that lawyers prefer to ignore and hope will go away, the way e-mail discovery was ten years ago," says FTI's Rob Brunner.

The article goes on to say:


"Like the mass of ice below an iceberg's waterline, the amount of structured data is often the bulk of corporate data, but is rarely seen. According to the Data Warehousing Institute, a technology research firm, approximately 47 percent of corporate data is structured in nature, compared to 31 percent of unstructured data. (The remaining 22 percent was described as semi-structured data.)"
I too have been through many complex exercises with all manner of structured data. In some cases our team writing complex applications and processes to preserve, then search, review and produce relevant information where it is needed. Like e-mail before it, the discovery of structured data is poised to add yet another billion dollars to the eDiscovery pipeline. Or will it? Slowly the years of discovery and trial experience fights back the technologist interested in all things complex with problems to solve. Wait a second! I revisit the title of the article - Database Discovery Is Dubious, but Unavoidable. Unavoidable? How often is structured data even relevant? If relevant, what value will those data have within the context of the actual facts underlying the dispute? Before we start building fleets of technologists and consultants dedicated to handling petabytes of structured data raining from the sky, we might want to consider the percent of that 47 percent of those corporate data that is actually useful outside of making widgets. In most cases, large and small, do you really need to preserve, collect, search and review every transaction in an enterprise accounting or procurement system, as examples? Preservation can be easily handled in most cases by simply taking backups most enterprises already have in place. If not, create one. Is it the entire SharePoint system that needs to be preserved and searched? Or, is it just documents for specific custodians? Do the Metadata records associated with each document even have relevant information? Maybe you simply need to review the reports easily generated by most “structured” enterprise systems, and targeted at the specific information of interest. Before embarking upon a big expensive effort to pull information from a system, whether simple MS Access databases, or complex relational Oracle based data warehouses, doesn't it first make sense to inquire about how the witnesses use the information? After all, we call data "structured" for a reason. Considering the amount of data collected each day for litigation, it is rare when there is cost justified reason to rebuild ways to query and report information outside its intended use. The way the information would have been used during whatever conduct gave rise to the litigation/investigation. Isn't the better approach to target those "documents" lawyers will use rather than large sets of mostly useless bits of data that will never see light of day in a deposition, much less a court room? The recent trend toward having the parties sit down with their respective technology experts is encouraging. Talk to the opposition and decide jointly what approach to take. If you can’t decide, the court will decide for you and that decision will weight cost vs. benefit. Far too much time, money and intellectual capital is spent today on the unimportant.

That is not to say there are not times when complex structured workflows are necessary. We have been there - done that - but those instances in the big scheme of things are rare. Brunner says "Unfortunately, e-discovery vendors have been slow to respond to this issue" and he is right. The reason, however, is not because we have not figured it out – some of us have. The reason there are not more widespread and standardized approaches is because there is a lack of demand.  That lack of demand, I submit, is not because data is necessarily being ignored. It just does not come up during discovery.  When it does, the relevance is usually marginal and the cost of obtaining the information is not justified when weighed against the benefit. When the need does arise, hopefully a “trusted advisor” will provide the right advice and ask the right questions before you act. Database discovery today is largely avoidable.  So, maybe the more appropriate name for the article should be:  Database Discovery Is Dubious, and Sometimes Unavoidable.There may not ever be a structured data equivalent to the manual "The Simple Solution to the Rubik's Cube", but we can look first at simple solutions to what does not have to be a complex problem.

No comments: