“It’s Not the Data, It’s the Models Stupid”-Big Data Helps But It Is Not the Answer
Posted May 21 2013 5:19pm
This is a great article from RWIO Science I felt and it explains what a “Black Swan” is, an event that nobody has predicted and everyone is surprised. We have a lot of them out there floating around today, just yesterday with the tornadoes in Oklahoma for an example. Basically what this article is talking about is modeling and the use of the data in a model and not just the data itself. In addition you hear about the bigger exposure to errors and noise as you look at more and more data, that is a full truism. Everyone wants to apply big data to Black Swan events and see if there’s a way to predict next time and sometimes there might be data indicators that would help and other times, you have nothing. Those not in data get easier slided here as they don’t know data mechanics and how all of his works. We see it all the time. We do develop too much confidence in our data in more ways than one..example below on my opinion post.
It does all come back to the interpretation of data and context is everything. Scroll down to the bottom of this blog in the footer and watch the first video for more on that topic and see what Charlie Siefe has to say, who wrote the book “Proofiness, the Dark Arts of Mathematical Deception”… in chatting with Tom Peters on Twitter, he went out and bought it, a good book. Basically when data is used out of context then the fun begins with high levels of noise and errors. We have it in healthcare as insurers and other healthcare entities use models. Watch the video at the link below and hear some big companies and entities to include NASA to hear them question value and models.
One of the best on the web that speaks about models is Cathy O’Neill, a Quant and a modeler, she tells it like it is. Don’t lie with models. Further down you can also find some comments from Microsoft on data interpretation and biases too.
We are all still very human and data and math don’t always play out the same on computers and screens as it does in the real world, “with software you can do anything”…keep that thought as you watch the video below. BD
In the Gold Rush to accumulate and put to use Big Data, we may actually be making it harder to actually glean insights from that data. Yet the prospect of data solving all of our problems is so tempting that even the brightest minds of our generation seem confused.
Take, for example, Irving Wladawsky-Berger, a strategic advisor to Citigroup and former IBM executive. Wladawsky-Berger is exceptionally bright, someone whose insights into open source helped me a great deal while he was still at IBM. But writing in The Wall Street Journal (" Spotting Black Swans with Data Science "), he "[gets] the Black Swan idea backwards," as Nassim Taleb , professor at New York University’s Polytechnic Institute and author of The Black Swan, points out . Completely. Backwards.
Black Swan events are major events that take us by surprise, but afterwards yield clear explanations as to why they happened. Examples include the 9/11 attacks, the rise of the Internet and World War I.
But they can also apply to business, and so there is a temptation to apply Big Data to spot such Black Swans before they happen.
The more frequently you look at data, the more noise you are disproportionally likely to get (rather than the valuable part called the signal); hence the higher the noise to signal ratio.
Whether in our models, our collection of certain kinds of data, or our interpretation of that data, we bring personal biases to the analysis, which Microsoft Research's Kate Crawford argues in Harvard Business Review. We cannot avoid this bias, and the attempt to look for correlation rather than causation in our data solves nothing.
In fact, it arguably makes the problem worse, because it gives us too much confidence in our data.
We're still early in Big Data, and enterprises rightly suspect that Big Data isn't some magic pixie dust that immediately yields insights into how much to charge, where to market, etc. Big Data can help, but it's not The Answer.