Data’s Forgotten Ingredient - Total View Analytics

Why Artificial Intelligence, Machine Learning, Data Lakes, Big Data and Analytics initiatives are so FRUSTRATING

—–

AI & Machine Learning is not the answer…yet. Maybe? Maybe never.

Data Lakes are not the answer…yet. Maybe? Maybe never.

Data Visualization tools are not the answer…yet. Maybe? Maybe never.

Shall I keep going? I don’t think its necessary…you see the pattern. Gartner says 85% of AI projects will “not deliver”…and that 87% of organizations have low BI and analytics maturity. VentureBeat is even less optimistic…they say 87% of data science projects never make it into production. Something is wrong. Something is missing. For all the money spent…for all the consultants…for all the organizational effort…for all the new technology…why are organizations seeing such frustrating results? There must be a missing ingredient.

After spending years helping clients formulate analytics strategies, identify new technologies, architect new platforms and deliver new analytics…I have finally realized there’s a key INGREDIENT that has been largely overlooked in recent years. An important ingredient. One that seems to make all data, reporting and analytics projects seem to come together…like a perfect meal. And if I can expand the metaphor…something that makes them all simply taste better, maybe even taste MUCH better.

As it turns out, the missing ingredient is DATA HARMONIZATION. Really? That’s it? Sorry to disappoint, but yes. So what exactly is Data Harmonization?…

Data Harmonization is a blend of two important “spices”. The first spice is a standardized data model and the second is master data management. Both help “harmonize” data in different, but important ways. The standardized data model helps you “organize” the data in a way that provides reliable, repeatable analysis. This is why good old-fashioned data warehouses actually worked…they were just too slow to build. The second spice, master data management, helps you “classify and define” the data. This makes it easier to find the right data and ensures it is used properly and more consistently. Master data management takes time, and meetings, and a plan…which means most organizations don’t like to do it. Here’s why Data Harmonization is so important…

Artificial Intelligence and machine learning need better data. A standardized data model provides a consistent way of organizing important data that is often used for the most important AI and ML projects. It ensures a set of higher quality, more reliable data. Data scientists keep getting thrown under the bus for under-delivering on all the hype of AI/ML. The analyst firms are confirming that the AI/ML technology projects are also under-performing. Why?

If you chat with the folks in the trenches, most of them will tell you that getting quality data that can be used to build statistically reliable models is far more complex and time consuming than most people understand. It is not uncommon for the data clean-up efforts for a new AI project to cost more than the AI project itself…if it ever happens at all. A better approach is to get your most important data into a harmonized data model, and then set the data scientists free. Sure, there are all sorts of interesting data mining projects you can run on loads of unstructured, mostly messy data. And those project may yield a return every once in a while. But most companies are just not there yet. They need to start smart…with core data that is already known to be key to their business. And that core data is still best kept in a centralized, governed, standard data model. And the associated master data needs to be defined, applied and managed. In short, that core data needs to be harmonized to make these projects successful and cost effective.

Data Lakes are basically the data dumping grounds of the early 2000’s. We are all generating so much data that we simply got lazy and just started dumping the data into a lake. Seriously, when has it ever been a good idea to dump messy stuff in a lake? It’s like we all forget about the environmental improvements we’ve made with our natural resources over the last 40 years. Did the “data people” go take some “stupid pills” and think we can just dump data into a lake and it will all work itself out? C’mon man! We’ve forgotten the key INGREDIENT that made data warehouses of the early 2000’s work so well…harmonized data living in a centralized, standard data model with master data management in place. Data Lakes are a tremendously useful tool, if deployed and managed properly. Otherwise, they are just what some people have started calling them….data swamps.

But wait…there is a useful new tool in the battle to clean up the Data Swamps of the world. Here come the Data Catalogs to the rescue! Data Catalogs appear to be gaining in popularity…it feels like you can hardly swing a golf club and not hit a company looking at a data catalog solution. But are they really helping? I contend that many data catalog projects are under-achieving their intended value as well. Not because the technology is bad…quite the contrary…the technology is needed and effective, but only if the organization is committed to using them as part of an explicit Data Harmonization effort.

Back to the story…so now that you have a data swamp, all you have to do is go buy a data catalog to tell me about all the junk that lurks in my data swamp. Most data catalogs can even tell you what data junk is “most popular”. Unfortunately, the only thing worst than having messy data…is to have popular, messy data! By using social media techniques, like “liking” some data sets, we can essentially “crowd-source” unreliable, messy data. Perfect, right?….not so much.

Now, on the bright side, many data catalog solutions will tell you, in advance, just how messy your data is by running data quality profiles and some other analytical tools against the data in the lake. But is the data that has high quality scores still the “right” data? I’m not sure. Is anyone sure? INSTEAD,…What if we used harmonized data as a filter by which we brought the data into the data lake? We could try to map core data from multiple sources to a standard data model…where possible. What if we set some data governance rules in the data catalog and applied those rules to the data as it is being brought into the lake? And then applied master data management to the catalog, instead of just letting the catalog tell us what it finds. Too many data catalogs are like the tail wagging the dog. If you think about it…maybe harmonized data IS the INGREDIENT missing from the data lake.

Bear in mind, that not everyone is getting this wrong. I attended a conference earlier this year and heard a case study of how the banking division of a large investment firm recently did a data catalog project. They shared with the group how they used a standard financial services industry model as the “legislation” for categorizing the data in their new data catalog. This worked beautifully! Data from different systems, brought into a data lake, properly cataloged, mapped and harmonized to a standard data model that is well understood by the business…and even understood by the regulatory authorities in the industry. Brilliant!

Surprisingly, not all data catalogs have the ability to accept legislative rules on how to harmonize the data in the data lake. Many of the data catalogs appear to simply catalog the chaos. Sure, organizing the meta data about the data in the lake has some usefulness. But to not give me the ability to harmonize the data to a standard model, or define my own master data? Shall I say “weak sauce”? Some data catalog solutions are much better at supporting Data Harmonization.

Bear in mind that not all data in the data lake needs to be added to the standard data model. Some data is should be perfectly fine staying out of the standard data model and living in the data lake…as long as it is fairly well-defined and understood…not just junk data.

A bad data model makes a dangerous dashboard. I am a HUGE proponent of using data visualization solutions. I am convinced by first-hand client success that it is possible to implement a data visualization platform and successfully improve the way organizations make decisions. However, I have seen data visualization deployments spiral out of control. Some companies use too little governance and users create all kinds of pretty charts and graphs and then spend hours arguing over who is the best “data wrangler”. Then they argue over whether each chart is even correct. If we want this kind of chaos, let’s all just keep using Excel! We can let users “wrangle” their own data, and then squirrel that data away into their own little data silos. Sigh…

Here is better idea…why not use DATA HARMONIZATION to store and manage the data used to supply the data visualization platform? Use the standard data model to power a series of reliable reports and dashboards that still serve a useful role in most organizations. Use master data management techniques to weed out poor quality data, and to standardize on key calculations and definitions. This helps users to not only use better data, but better use the data. I’ve observed over the years that some of the data visualization platforms are better suited to support Data Harmonization. Some platforms are able to easily ingest the data but provide weak governance and master data management once it hits the platform. Others provide better master data management capabilities, but may not be as easy to use for the end users. There always seem to be trade-offs to consider, but the core benefit of using harmonized data is that the key data is more consistently reliable. This results in TRUST from the users…and trust leads to BETTER USER ADOPTION…and better user adoption leads to BETTER DECISIONS.

Voice and Chat analytics are most frustrating. Can we just tell it like it is? Asking your Amazon Alexa or Google Home or chat-based analytics solution a question that is even slightly abstract, results in wildly strange, and mostly useless answers. Don’t get me wrong…there is an Echo Dot less than 2 feet away from laptop right now. I still use it…for a weather forecast, or to tell me the news, or to listen to a podcast. But to think about asking my Echo Dot to tell me who my Top 10 customers are is simply not going to happen. A client of mine recently asked that question…to me…their consultant version of Alexa…except that I have a deeper, more sultry late-night FM DJ kind of voice. My response was…”Well, that depends. Do you want the top 10 customers from system A and system B or just system A? Do you want me to combine sales amounts for parent-child company relationships? Do you want it for the last 12 months, or for the fiscal year? ” Do you see the issue? Such a simple question like “Who are my Top 10 Customers?” is not so easy to answer without well-structured, harmonized data. Harmonizing your data to a centralized data model, along with proper master data management, at the foundation of a voice or chat-prompted analytics solution will provide more accurate answers that might actually drive better adoption of the technology. Until then, stick with “Alexa, tell me a cheesy joke.” Remember…I’m from Wisconsin.

When and why did we lose the ingredient? I think most organizations who had it, lost the ingredient of Data Harmonization over the last 10 years. But why? I believe there are several reasons. First, data warehouses, as a vehicle to drive a standard data model, simply took too long to build and were/are really expensive. Organizations lost patience and got cheap… and, at times, became lazy. I even admit I am partly to blame. I can spin up a data visualization solution in minutes or hours or days, and start pulling information and analyzing it. I have helped many clients do this. It is faster, and easier, and with the right platform, it does not need a proper data warehouse or a clean data lake. Heck, for some of us, it is even FUN. Gasp!

Creating a standardized, central data model and then implementing master data management constructs can be very laborious. We may need to have meetings to agree on business definitions. We may need to think about how the data will likely be used to ensure that the proper data is available. In some cases, you might even need to structure that data differently. It can be complicated and geeky. Are we talking about logical data model, or a physical data model or an analytical data model…someone please staple my head to the carpet! Somehow, we’ve lost the motivation to do things right. We’ve become overwhelmed by the velocity, volume and variety of big data. We fell behind in our ability to process data to make decisions and we looked too often to shortcuts. Well guess what…we’re paying the price with poorly executed data-related initiatives…all around us.

Here is a metaphor to consider…Proper data harmonization is kind of like getting a new mini-van. Once you have a few kids, you kind of know you really should have one. So you go to buy one and realize they are more expensive than you thought. But you still buy one and once you have it for a little while, you realize it’s a lot more useful than you thought. Unfortunately, the other thing you realize is that when you haul the kids around each day, it gets really messy a lot faster than you thought. So at the end of the day, you know you need a mini-van…and you know you need to take care of it or it will be a mess. But deep down, you still want a Ferrari. Makes sense, doesn’t it?

Are there any bright spots? YES! Data Harmonization has not been completely forgotten. I have found there are several industries in the world that absolutely rely on, even depend on, the use of a standard industry data model. Industries like Insurance, Financial Services and Transportation. These industries have standard models that can be used to structure and conform (harmonize) data for consistent global analysis. They need data structured a certain way to adhere to compliance reporting. And this has led to the ability to create more reliable and repeatable data analysis projects based on the harmonized data.

Sales and Marketing as functional areas have also been a bright spot in the world of Data Harmonization…and most of them really don’t even know it. In fact, master data management has some of its’ strongest supporters in Sales and Marketing. They know just how important it is to establish a single version of the truth when it comes to “the customer list”. Sales wants commissions and Marketing wants list to be correct, so having a reliable, single version of the truth is paramount.

How do you get a standard data model? Truth be told…it depends. Many industries have standard data models popping up all over. For some industries, you can simply buy a data model. Heck, IBM has a bunch of them I am sure they would be happy to sell you. One company I am working with has just developed a standard model for structuring data to enable Anti-Money Laundering analysis. Microsoft even has a relatively new initiative underway called The Common Model…fancy right? Not a creative name, but I give them props for recognizing that a lack of a standard model is minimizing the adoption of all the other data-related initiatives.

For some companies, you just have to take some time to build a standard data model. I am in talks with a professional services firm who needs a standard model. I am even talking with a large dental services company that could really use one too. Standard models are definitely emerging, but I have not seen many folks really talking about the lack of a central, standard data model being the root cause for so much frustration.

The 800 Pound Gorilla in the Data Model room. So if you don’t have a standard data model, and it is not immediately feasible to buy a pre-built model, then you need to build one. Here is the good news…with the technology available to day, you can do it a lot easier than in the past. You can use data visualization and data profiling tools and data wrangling tools and even data catalogs to help understand your data and help you build a model faster and easier. I have even found a new tool recently that helps the implementation process by “ingesting” the new standard data model and then helps you MAP all the source systems to your standard model. These maps can be used to generate code to speed up the ETL process to populate your new model. Or…you can take the model and ingest it into some of the better data catalog solutions, so you can keep your data lake, but query it according to a standard model. I have even seen companies start small by using a good data visualization tool first, and then slowly build out their model over time…eventually using the standard model instead of pulling data from the source systems directly. It can be done.

Let’s be real…not every organization has completely lost the ingredient of Data Harmonization. Some are performing admirably and using data incredibly well. But even if the analyst firms are only partly correct in their assessments, far too many resources are being spent on data-related initiatives that are producing lackluster results. As a data-minded community, we can and must do things better. Give Data Harmonization some thought and make it part of your data management and analytics strategy today…you’ll be glad you did.

Shawn Helwig is the Managing Partner of Total View Analytics and serves as the Director of Product Strategy for Pomerol Partners. Shawn is also the author of the soon-to-be-released book, titled, “Analytics to Win® – A Practical Method for Building a Data Management & Analytics Strategy”. He has been a business and technology consultant for over 20 years and his passion is to help organizations use data and analytics to solve problems and improve decision-making. Shawn also serves on the giving committee for the Madison Christian Giving Fund, serves as a volunteer for Compassion International, and has been a member at High Point Church for 15 years. You can reach Shawn at shelwig@totalviewanalytics.com or at shawn@pomerolpartners.com.