Tuesday, 28 May 2019

Data, Data everywhere

I'm using a reference to the Rime of the Ancient Mariner for the title of today's post, because there's rarely been a more interesting dichotomy facing most management teams.  As a colleague of mine is fond of pointing out, data is the new oil.  For those old enough to remember the Beverly Hillbillies, Jed struck oil and became rich, and moved to the big city.  In many ways digital transformation and the value of the data it creates will enable many new digital hillbillies to strike it rich, striking gushers of data.

At some level the mere availability of data is valuable.  When no or very little data is available, any data is valuable and precious.  When someone strikes a gusher of data (Facebook, Google, etc) then that data and the access to that data can become profitable.  But what happens in large organizations when data becomes ubiquitous?  What happens when we have thousands of sensors and IoT devices submitting data, along with consumer data and reviews, and social media feeds?  What happens when there's data everywhere, of every type, velocity, validity and all the other V words that experts use (variability, veracity and so on).  Increasingly we aren't sitting on gushers of data, we are swimming in rivers and lakes of data, soon to be oceans of data.  And here's where we circle back to the Ancient Mariner.  We may be afloat in a sea of data, but an awful lot of it isn't valuable or useful, because we can't be quite certain which is recent, valid and most importantly, normalized.

ETL - Phone home
If you had real time access to all the data in the world, and it was all verifiable, accurate and without inherent bias, you'd still face an enormous challenge.  Gaining insight from mechanisms like machine learning and AI will only happen when the machines can read and make sense of the data.  Right now we are generating gushers of sentiment data, quantitative and qualitative data and other kinds of data, that aren't normalized and require some human intervention.  In fact most honest brokers who are dealing in machine learning and AI will tell you that the "long pole" in gaining value from AI and ML is in another acronym: ETL - Extract, Transform and Load.  There are a couple of important activities in that acronym.

Extract - find the data and get it out of the originating system.  This could be data from sensors or devices on other devices like your iPhone or Alexa.  Where the data is generated is often very different from where it is stored, thanks to the cloud.  We've got to find that data and consolidate everything we know about a certain individual or segment or product, and get all of that data into one place.

Transform - Your data can take hundreds of forms - binary, quantitative, analog, digital, hexdecimal, images, voice, text and so on.  Machines can be taught to read and recognize any type of data but they can't easily determine the validity and value of different types of data.  Thus, we must normalize the data to some degree - help machines understand why a picture is worth a thousand words.  And it's this work that will be the biggest barrier to full adoption and use of machine learning and artificial intelligence. In fact we may need machine learning and artificial intelligence just to find, clean, evaluate and normalize all the data we are generating.

Thus, as strategists and innovators we are left in an interesting predicament - the more data we have, the more potential value we have, and the more the problem of actually finding and using the data that matters increases.  This is in fact an exponential problem, because the data is increasing at a rate faster than we can determine how to make sense of it.  At some point only machines will be able to interpret and understand all of the data we generate, so it behooves us to begin to either standardize the data formats, sources or streams (which isn't viable due to competitive differentiation) or to improve the ability to find, clean, standardize, rank and normalize the data we have.  Otherwise we'll sit on oceans of potentially viable data unable to extract the value, as new oceans of data are created.

Having more data than a competitor doesn't convey an advantage unless you can make sense of the data and use it more effectively.  In fact in many cases having more data may make it more difficult to make good decisions and as the volume of data accelerates and the range of data types increases, it will become every more difficult to simply keep pace with the data.  Like the Ancient Mariner you'll be awash in data, floating in data but without an insight to drive your business.

Who is responsible for managing this data?

Here's another interesting challenge - thinking about who is responsible for managing this data.  The traditional IT team has been overwhelmed with simply keeping the operational systems running.  Your email, core systems, financials and other operational systems require constant attention, and constantly upgrading to the latest releases and protecting the data from hackers is a constant struggle.  Does your IT team have the bandwidth and skills to capture, manage and make sense of the data?  Should a data scientist report to your Chief Information Officer?  If not, then where should people who are good at managing data and making sense of the data reside?  What should they do?  Who directs their work?  I'm not sure there's a good answer in many companies to this important question.

\

0 comments: