Why the Pundits Failed to Predict Trump
Contributing bloggers: Susan Kavanagh and Mark Walker
Democrats and Republicans alike are looking back at this
election and saying “how did we miss this?”
Democrats took great pride in their “analytics,” and much of how Hillary
positioned herself to the public was based upon polling and analysis of
polling. Obviously, the polling was not
accurate, so the entire Democratic strategy based on these analytics was – well
-- way off base. In the Trump camp, Donald J. simply ignored the polling, and
went with his gut. Turns out Trump’s gut
was right and the Democratic pundits were wrong -- really wrong! Here’s what happened.
In
order to understand at a very high level how this technology works, some
background is necessary. In our world of
litigation, we use analytical algorithms that were originally developed for
political polling. In fact, the FBI used
the very analytics we use here every day to analyze email in connection with
the Clinton email investigation. That is
called “eDiscovery.”
Those of us in the eDiscovery industry have
struggled for years with telling family or friends “not in the business” what
it is that we do for a living. Our
elevator pitch for our customers, mostly attorneys, goes something like this: “We
help pull ‘Electronically Stored Information’ (ESI) from your client’s servers,
process it, analyze it and then use analytics so you can decide what to
produce, or, if you are the receiving party, decipher what is important. Well, actually, most lawyers just know what
eDiscovery is, albeit at a very high level.
Actually, most don’t really want to know how the sausage is made. They are simply interested in the result.
Explaining to family and friends what
eDiscovery is, is an entirely different matter.
Sometimes I just say I do data forensics with stuff like email. “You do what with email?” they ask. Then the
explanation would go something like “we collect electronic data from
corporations, process that information and then using ’high-tech’ we help the
lawyers find what’s important among what is usually stored in a great sea of
information.” Most folks don’t really have any point of reference because they
almost never deal with lawyers and certainly don’t have anyone looking at their
email (or at least they think they don’t).
Enter the Hillary Clinton email scandal. Now we have a point of reference that pretty
much everyone has heard of and directly involves eDiscovery, identical to what
I do day in and day out. At some point,
someone – likely a service provider like the company I work for – collected
Hillary’s email from her private email server.
That service provider then processed the email. When ESI is processed, information about that
data is extracted from those files. A
great deal of information is extracted beyond just the text of the email.
So, how is this relevant to the email scandal
and the new information from the FBI?
Reports vary, but Hillary appears to have produced approximately 30,000
emails to the FBI. The FBI apparently
reviewed those, and decided not to recommend prosecution to the DOJ. Then, on Friday October 28, the FBI announces
that it has found more email (roughly 650,000) that “might be relevant” to the
Clinton email investigation, and that it “needs time” to sort it all out. This email was found on the laptop of a
Hillary Clinton adviser in connection with an unrelated matter. We won’t give that unrelated matter any new
press here as it is of no consequence. Of particular interest is whether there
are any new emails on that laptop that are relevant to the Clinton
investigation. In addition, of those
that are relevant, are they new or are they just duplicates?
The FBI initially indicated that it might take
months to review the data, those of us that live in the eDiscovery world know
that this is a small universe of information and with the appropriate
technology, should only take about 24 hours to analyze and review. As it turns
out, it took the FBI only a few days to determine that there simply wasn’t
anything there that changed the recommendation not to prosecute. The FBI didn’t review 650,000 emails. They simply applied analytics.
With that oversimplified explanation of how
analytics are used to analyze information in litigation, how does this tell us
that the pundits got it wrong? The error has to do with how those analytics
were used, not a problem with the underlying algorithms. The algorithms are based upon proven
mathematical science that has been used for decades. If you provide the
technology with the right inputs, you get the right answer. Just like “garbage
in equals garbage out,” bad input equals a bad result. When we
first began using analytics, as the FBI did during its email investigation, some
of us that grew up in the legal world (as opposed to the technical world) began
asking questions about how the math was being applied. Specifically, some of our questions were about
sampling sizes. Are we getting the
appropriate samples that will let the technology “learn” and model whatever it
is we are trying to attain?
ESI Advantage wrote about this problem in May
2012 – “Are your samples Random? Are you just getting random
results?”
The problem is very simple in both the legal
world and in the world of political polling.
It is very easy to get your inputs wrong. The problem has to do with both the sample
size and the actual nature of the sample.
As explained more fully by reading a Ralph Losey work, or the many
posts on ESIAdvantage, the problem is with how sampling is being performed. In political polling, pollsters randomly
select potential voters to call and as ask a very short and list of questions
that require a “Yes”, “No”, or “Undecided” answer. The “math” tells the pollster how large the
sample size needs to be to meet a mathematical margin of error. Based upon the size of the population, the
math tells the pollster how many potential voters need to be polled. Pollsters collect additional information
about the profiles of those that were interviewed such as race, religion, how
they have voted in the past, and so on.
That information is input into the technology and modeling is created
that should predict who’s leading the polls and the demographic of voters who
are likely to vote for one candidate or the other. Those analytics help decide
how to message issues directly to those that were polled.
Sample size calculation to meet a specified
margin of error – say +/-3% - is based upon a large percentage of those sampled.
20% – 50% will answer Yes or No, with
the remaining being categorized as “Undecided”.
The composition of the sample is critically important. The sample size in political polling is
usually a few thousand out of many millions, so the nature and input of those sampled
is critically important. All downstream
analytics are based upon those answers. Again,
a bad sample equals a bad result.
In the litigation world, we are usually also
sampling many millions of “documents”.
In the early days, some of us immediately saw the problem: the sampling
math doesn’t work. Why? The likelihood
of a positive answer from a purely random sample had about a 1% chance of being
relevant, if the sampling size of what the algorithm uses relies on a high
“richness” of positive answers. In
litigation, our “richness” is very low and the sample size calculation doesn’t
work and the likelihood of getting relevant documents in your sample is very
low. The technology therefore has a much
harder time determining what is relevant in the population as a whole because
the algorithm is making those predictions based upon the textual content and
concepts contained within the documents being sampled. The legal experts have literally spent years
debating this problem with the technology experts who design the tools and
tweak the algorithms being used to fit our needs. Many argued early on that we need to “stack
the deck” and raise our ability to locate relevant documents in our sample so
that the math will work. Many technologists
debated that it does not fit how the technology is designed. The approach we recommended is now referred
to as “judgmental.” We select known
relevant and irrelevant samples. Very
often we will use search terms that will raise the level of relevant documents
in our sample. This allows us to sample
the right number of documents to “train” the technology about a relevant
document profile, so that the analytics work.
Today, the debate over “judgmental” vs “statistically random” sampling
is over with just a few technologists still holding out, largely because they
are still peddling outdated technology and methods.
Of course, you rely on the answers you get
during sampling to be truthful, accurate and know that it doesn’t change until
you take a new sample. This is because
the person reviewing the sample, usually a lawyer at a very high hourly billing
rate, changes their mind about what is relevant as they learn from
reviewing. What is relevant can and does
sometimes change dramatically. If we are
using what’s called an “active learning” approach, the technology adjusts the
kinds of samples being presented to the lawyers with each new sample reviewed,
determining what it needs to learn based on how the system decides what changes
in relevance.
So, what went wrong with the how pundits projected
the election? As it turns out, there
were many more “undecided” voters that the pollsters predicted and the wrong
profiles were being sampled. Many of
those polled either didn’t reveal that they were going to vote for Trump, or
they simply changed their mind. Like we
did in the early days of using analytics in litigation, the pollsters used the
wrong methodology, not the wrong technology.
In litigation, most of us have resolved the problem. We were driven to do that sooner rather than
later, because if we have a miss, it costs our clients millions of dollars in
review costs, because we are sending too many documents deemed relevant to lawyers who are billing by
the hour. We have to use the latest
technology and do not have the luxury to be wrong in litigation. More importantly, we have to use the right
methodology.
In summary, the Democrats were using the wrong
consultants, and the wrong approach. In
the months and years to come, we hope to see vast improvements in how political
polling is conducted.