I have often been accused of putting everything through the lens of data and analytics. Many of my long-time followers know of my fondness for finding quotes or pithy sayings from movies, books, and everyday life that reinforce the message of leveraging data to drive a better outcome.
One such idea is from the book Robopocalypse by Daniel H. Wilson. The gist of the story is that a computer achieves intelligence and quickly determines humans are the problem with the world, and you may have guessed, fun and hilarity do not ensue. The part I love is the idea that things do not matter; it is the relationship between things that is fascinating. This is a tailor-made line for my career as I have worked with customers on the importance of the relationship between data to drive outcomes for decades.
Schrödinger's Data?
I have often heard how data is “bad”, data is “late”, or we have “dirty data”. What I constantly have to ask is “how do you know that”? Much like the cat in the Schrödinger's infamous thought experiment, data has no state until it is observed with other context. The cat in the box is neither dead nor alive until you observe it. This is the big point that must be understood in our world – data by itself is neither good nor bad, timely or late, clean nor dirty.
Data just “is” until an external point of view is applied to give it context. Even then, data has different characteristics depending on whose point of view is being considered. For example, Mario may think getting hourly data is good enough for his process but Daphne needs to have the data every minute which would make the hourly data “too late”.
This is not to imply data is not important and should be ignored. It underscores the fallacy that just collecting data for the sake of collecting data is not a winning strategy.
Giving Context to Content
What brings value to the data is the relationship it has with other data elements or to the situation at hand. People shop at stores, that have items supplied by vendors. Patients go to hospitals to be treated by doctors for diseases. Companies ship customer packages on planes or trucks. These very simple examples of how things relate to each other open up a vast array of questions that can be asked so companies can get answers on how their business is running, and how it can be improved.
Using the first retailing example, an analyst can start asking:
- “Which customers spend the most at our stores on a per visit basis?”
- “Which vendors have the highest rate of return for their products?”
- “Which items attract the high end clientele and will they spend more if the product is in stock?”
As one can see, there is no end to the types and number of questions that can be asked yet there is one thing remains consistent and that is the data itself. The trick is
making sure that the relationships are preserved in the data models in a way that these questions can be asked.
The best part is that the more relationships you can find and associate into your analytics, the more value you get from your data. As Michael Porter states in
Harvard Business Review “This new product data is valuable by itself, yet
its value increases exponentially when it is integrated with other data, such as service histories, inventory locations, commodity prices, and traffic patterns.”
Unfortunately, too often people take a single view of data and assume the questions, and therefore answers, into their solutions. Another common mistake is to think just because the data is in a “data lake” it does not need to be modeled or managed and so the data relationships are not preserved. These are just two ways companies ignore data relationships and end up with complex and costly silos. These silos only end up creating confusion and stagnating innovation.
Vantage - The Total Picture
A cautionary point here,
do not confuse data relationship with data placement. For example, an insurance company may store car and driver behavior in Hadoop or S3 for low cost but they maintain the Vehicle Identification Number (VIN). The VIN provides the relationship to know which insured client owns that car and all the insured information would be in the relational database. Without the VIN, properly kept in both locations, you cannot connect the car data to the insured during your analytic processes.
The power of
Teradata Vantage is the ability to store and manage data as the value dictates and still access and integrate the data as the business needs demand. This powerful combination allows companies to quickly adapt and adopt their analytics to
drive meaningful answers that can truly improve their own relationships with their customers!
Starting with Teradata in 1987, Rob Armstrong has contributed in virtually every aspect of the data warehouse and analytical processing arenas. Rob’s work in the computer industry has been dedicated to data-driven business improvement and more effective business decisions and execution. Roles have encompassed the design, justification, implementation and evolution of enterprise data warehouses.
In his current role, Rob continues the Teradata tradition of integrating data and enabling end-user access for true self-driven analysis and data-driven actions. Increasingly, he incorporates the world of non-traditional “big data” into the analytical process. He also has expanded the technology environment beyond the on-premises data center to include the world of public and private clouds to create a total analytic ecosystem.
Rob earned a B.A. degree in Management Science with an emphasis in mathematics and relational theory at the University of California, San Diego. He resides and works from San Diego.
View all posts by Rob Armstrong