Data Exhaust & Data Waves

Is your team reacting or predicting?

Last week, I heard Paul Kedrosky, Senior Fellow at the Kauffman Foundation and Bloomberg contributor, present “Data Exhaust: What We Know About Everything By What No One Tells Us” at the PARC Forum. “Data Exhaust” is his term for the “unintended information we throw off in our daily activities”.

His primary example was the analysis of the debris reported in real time by the California Highway Patrol (CHP). He found patterns that were temporal (Christmas trees in early December and late January) and geographic (mattresses near a discount mattress store immediately adjacent to an on-ramp thereby lacking the opportunity to determine if the mattress was secure prior to driving at speed). More strikingly he discovered the number of ladders dropped on Southern California freeways coincided with the real estate bubble. His thesis is that as the market heated up so did the demand for “marginal providers of labor / ladder capital”. I.e. those engaged to perform construction or remodeling were less professional in transporting their ladders.

Dr. Kedrosky went on to mention that we have the capacity to measure almost anything which increases the volume of data. In fact we have so much data that it tells us whatever we want. However, we may become focused on the data and forget what is important. In his blog he poses the question “what are the consequences of an instrumented planet?” As the volume of the data increases we are likely to see more extremes in the data (even if random) – what are the consequences?

Already there is a problem with the amount of indexed data (Google, etc.) that it is sometimes hard to find the valuable items. Therefore, he says, the pendulum has started swinging back to lists and resources edited by “curators”. In a way, he is saying that simply finding the patterns or indexing the data is not enough. You need people to put things in context to make sense of it.

But beyond human curators, there needs to be a different systems approach due to the magnitude of the problem suggests Bernard S. Meyerson, IBM Vice-President Innovation, in his Semicon West 2010 keynote “From Gigahertz Systems to Solutions: Our Industry in Transition“. From 2006 to 2011, there was a 10x growth in the volume of stored data. Shortly, the “Internet of Things” will grow 2000x in a similar period with an accompanying explosive growth in data.

IBM’s suggestion is that due to these demands we need to change how we build systems. One of their solutions is “Stream Computing” – real-time continuous queries looking for patterns and immediately taking action based upon them. We also need to build “Systems of Systems” – integrating multi-faceted systems into cohesive networks to provide intelligence out of the data. As the data volume grows, systems will need to predict instead of react to the data. An example provided by Dr. Meyerson is an IBM project for the Singapore traffic network – by the time the networks reacted to “real time” data the conditions had changed so they needed to build predictive models to achieve the required performance.

Both Dr. Kedrosky and Dr. Meyerson pose interesting challenges and insights on how to address the challenge of ever increasing data (intentional or unintentional) volume. It is clear to be successful you need to find the signal in the noise and process it faster and more efficiently than your competitors. But what isn’t discussed is that dynamic organizations which are extremely flexible, apply continuous learning, and have sufficient resources will be the ones to surf these waves of data and win. If you wait to reorganize until after the problem becomes a crisis, you will miss the window of opportunity and your competitors will pass you by.