Ep. 9 Why Enterprises Using Generative AI Are Hitting a Data Wall | AI Insights and Innovation

[Dave Linthicum]

Introduction

Welcome back to AI Insights and Innovation, where we talk about the realities of AI, including generative AI and agentic AI, and how to make all this stuff work for your enterprise. I’m your host, Dave Linthicum, author, speaker, B-list geek. Let’s start the conversation.

Challenges Enterprises Face with Data in AI Systems

What we’re going to do here is get into the meat of the challenges that enterprises are facing with their data as they transition to generative AI systems, or any AI systems for that matter. There can be agentic, traditional AI, deep learning, machine learning systems. Really, we’re looking at where they are now, or their as-is state, where they’re looking to be or where they need to be in order for them to make AI work, and the delta between the two. The fact of the matter is that many enterprises have a huge chasm between where their data is now and where their data needs to be, in order for them to build these AI systems, specifically the generative AI systems.

[00:01:00]

Everybody wants to build right now. We’re going to look at some of the issues there.

Survey Data on AI Readiness Gaps

There’s a lots of survey data out there. You can Google away for it. I do reference an article I wrote in InfoWorld, which references a survey from the Enterprise Strategy Group. They report that by surveying 800 IT decision-makers revealed that more than three and five organizations have notable gaps in the AI readiness, particularly in infrastructure and data ecosystem. This is something I’m seeing as well. One of the things I do when I teach my generative AI architecture class is get into the realities of what it takes to design and build one of these systems.

Data Issues as the Biggest Challenge for AI Adoption

At the end of the day, these aren’t AI engineering issues. Those are the easier problems to solve. But you’re dealing with data issues, first and foremost, and getting around some of the limitations of the data that you need to use as training data to make these AI systems work, is the largest challenge that I see facing us in the next five years as we’re trying to move to AI.

[00:02:01]

CIOs Tasked with Building Strategic AI Systems

What’s occurring now is CIOs are tasked by the boards of directors and CEOs to build these strategic AI systems. They’re going to bring this additional value to the company. I understand why they’re doing it because everything is AI these days around the hype. The problem is, certainly companies have been around for many, many years, is that they have legacy data stores, they have huge data complexity problems, they have huge data silo problems, and moving to AI or leveraging that data for training data to train their AI systems to know what they need to know to carry out the processing that they want them to carry out is not going to be an easy part. It’s going to be a very hard task to do. So they have to overcome something that’s a very ugly problem to get to the promise and reality of what AI is promised to bring them.

Core Challenges Enterprises Face with Data in AI

And that’s it in a nutshell.

[00:03:01]

So enterprises face several core challenges related to data when transitioning to AI systems.

Data Challenges in Generative AI

In this specific degenerative AI, because obviously many enterprises are looking to build large language models and small language models are going to use a massive amount of data. So in some cases, lots of data, almost all the data that exists within their enterprise. So they’re trying to make use of their inventory data, logistics data, their sales data, their marketing data, all these sorts of things that can be brought to bear inside a AI system, specifically a generative AI system, which will get them to a level of value. They’re able to make core decisions based on analysis of the data that they can’t see. In other words, it’s trying to see the forest or the trees, the ability to look at this data complexity and abstract it down into something that the stakeholders can take actions on. And even automate those systems by embedding those decisions, embedding that knowledge, back into the core business processing.

[00:04:02]

So there’s a lot at stake here.

Potential Benefits of Successful AI Implementation

You think about this, businesses that are able to pull this off, your ability to get to an event-driven state where all of the core decisions and all the core processes, whether it’s inventory control, sales order automation, all the processes that make businesses work, can function with almost perfect information. In other words, a complete understanding of who the customer is, a complete understanding of the product they’re selling, a complete understanding of the logistics system that they’re leveraging, the transportation system they’re leveraging, and optimize those things in real time, continuously improve them in the domain of the AI system. That’s huge. We’re gonna see companies are able to produce things faster, produce things in better quality, provide a much better customer experience, ship things to you extremely quickly, and the ability to monitor and manage all these things through this automated infrastructure, which is driven by the AI system.

[00:05:01]

So that’s the goal here. And I think everybody understands that. We’ve talked about that enough. And by the way, that’s nothing new. We’ve been talking about this for the last 30 years, the event-driven enterprise, data integration, all these sorts of things.

The Obstacle Enterprises Need to Overcome

Now we have an opportunity to use technology that really truly is able to change the game, but enterprises have an obstacle in front of them that they have to figure out how to get around. And that’s what we’re talking about here.

Siloed Data as a Challenge

So the challenges enterprise face include siloed data. Data often exist in isolated silos across organizations and typically owned by different groups. That seems to be more of a problem than the technology fragmentation and the inability to get these data silos to talk is getting around the political infrastructure within these organizations because some data is owned by marketing, some data is owned by logistics, some data is owned by inventory control, some data is owned by manufacturing.

[00:06:01]

And they’re run by different people in different departments, sometimes different profit centers. And people don’t wanna share their toys. They’re not exchanging information well across these silos. And in order for these AI systems to have a complete understanding of what the business is, those silos have to get broken down. So the siloed data has to go. We have to figure out some way to put abstraction layers on top of them, integration layers, so we can read all the data that we need in a timely manner to train these AI systems to do what these AI systems need to do.

Poor Data Hygiene as a Challenge

Second would be poor data hygiene. Companies suffer from bad data hygiene. And this is a huge issue as well. The inaccuracy of the data, the reliability of data within many of these organizations, even key data that they live upon, inventory control, sales order entry, customer information, things like that, may not be of a good and accurate state. And I’m finding this a ton. So they haven’t fixed it. And so much so that even the employees in the companies don’t rely on the data each and every time.

[00:07:04]

And the customers who are using the data or consuming the data through websites or mobile applications may be finding the same thing. Well, that’s a problem unto itself. But if you feed that erroneous data, that poor hygienic data into AI systems and it’s training from the systems, you’re just gonna get bad knowledge that pops out of the data. Because unless you have accurate and reliable data, the AI system isn’t able to train itself to take it to a level of understanding where it’s gonna be of use to. So I always say, bad data in, bad data out. It’s been a mantra for as soon as I started my career, garbage in, garbage out. And this is no different. Now you put crappy data into your training data, you’re gonna get a crappy knowledge engine that’s gonna provide no value. And by the way, these things are gonna be hugely expensive to build and operate. So without that value coming back from utilization of that training data, you’re going to be in a bit of a lurch.

[00:08:07]

Lack of Semantic Understanding as a Challenge

So the next would be lack of semantic understanding. And this is a big one. Certainly I saw this in the integration days. And if you read my EAI book back there, and that’s like 30 years old now, talked about having a semantic understanding of the data within the enterprise so we can create the integration flows between the source and target systems. And this kind of is no different. So companies frequently lack a common semantic understanding of what their data is. And this is easy to prove. Go into a room of data managers for large enterprises and ask them to raise their hand if they have a single source of truth for customer, a single source of truth for product, or a single source of truth for anything else that should be important to them. And if they’re honest with you, their hands aren’t gonna go up because they don’t have a single source of truth. The data’s all over the place, has different meanings. We don’t have a common metadata layer. All these things don’t exist that are gonna be table stakes to make these AI systems work in terms of your ability to leverage these data sets as training data.

[00:09:11]

Technical Debt and Architecture as a Challenge

So the next category of a problem would be technical debt and architecture. Technical debt basically is enterprises kicking the can down the road and doing so on purpose. So in other words, they put in a net new system that they felt they needed. That could be an ERP system. It could be a business analytics system, what have you. And they understood that they’re gonna have to loop back and fix the data integration problem, fix the data hygiene problem, fix the data semantic problem, but never do that. And so the technical debt remains and it becomes more technical debt as time goes on. So scattered data storage, inefficient processing. And so the debt complicates the efforts to implement all these AI systems because we have to solve these issues first. They’re normally gonna be very expensive to solve.

[00:10:02]

That’s why they were kicked down the road. So the enterprise is gonna have to invest in that. And that’s a tough investment to get. You have to look at the stakeholders and people who are gonna fund those projects and let them know that we’re really not gonna get any immediate value out of fixing these systems, removing this technical debt, but we will get to the AI value later on down the line. So in other words, we have to spend a tranche of money to get to spending another tranche of money to actually get to the state where we need to be.

Need for Transformative Data Architecture

Need for transform data architecture. We need data architects here to transform the data that integrates various data sources into a coherent and unified system. I always talked about back in the EAI days in the 90, having the enterprise data model, the unified enterprise data model, having a semantic understanding of every piece of information that exists in an enterprise, never got any close to that. In fact, it’s gotten worse. So someone’s gonna have to take the helm. Normally that’s gonna be chief data officer working for the CIO, maybe a special project that the CTO is gonna take on to look how bad the information is and take steps to repair it and take steps to redesign the system, take steps to look at core enabling technology, data abstraction technology, cloud-based data systems.

[00:11:21]

All these things that are available to us now that may have the ability to make things better.

Avoiding Throwing Money and Tools at the Problem

And one of the things I’m gonna warn you, people have a tendency to throw money and tools at this. That’s almost never gonna be successful. You have to have the ability to deal with this as a transformative problem and you’re gonna need to hire some talented people and you’re gonna need to take some time to fix it. And I think it’s gonna be worth fixing it because eventually you’re gonna have to fix this one way or the other.

Enterprises Falling Behind Due to Data Problems

I don’t think we can kick the ball down the road anymore now that we have all these value systems in front of us. And indeed, I think we’re gonna see a lot of enterprises that are gonna kind of fall by the wayside

[00:12:00]

because they’re unable to leverage your data for AI and other purposes because they have a data problem that they couldn’t get around or couldn’t afford to get around or unwilling to get around, which means they’re unable to get to the more advanced data systems, things like AI, generative AI, agentic AI, and they can’t take advantage of these things in the marketplace. And I think their competitors, many of them will be able to take advantage of them, certainly the younger players that may not have as much of a data problem and they’ll eat their lunch in the marketplace. And we’re gonna see a lot of that occur over the next five years. And I’ll tell you about that here.

Strategic Use of Data and Available Data Sets

So other things to consider would be strategic use of data, available data sets. Enterprises don’t often have access to a limited number of data sets that are significantly needed. So in other words, they can’t get at the data they need. Maybe the data wasn’t stored. You see that a great deal where we’re not tracking the data and therefore the data’s missing. You can’t go back and recreate that data. Those events are gone forever if you’re not recording data about them.

[00:13:02]

So lack of data could be a concern for many enterprises.

Security Challenges with Data Complexity

Security, again, we’re dealing with data complexity. Complexity leads to vulnerabilities because we have a heterogeneous stack of data running on the mainframe, running on private clouds, running on public clouds, running on cores, running on edge computing systems, running on mobile systems. All well and good. I understand we’re stepping up to solve tactical issues. That complexity actually causes a security issue because if we have the data scattered everywhere, we have to think about how we’re gonna secure it, how we’re gonna encrypt it, how we’re gonna track it. And the complexity of data just makes vulnerability and risk go up higher. It certainly makes the cost of securing those systems much higher. I don’t think that means we steer away from heterogeneity and can’t avoid complexity, but certainly we can think about better techniques that to go off and mediate that complexity. And I talked a lot six years ago in terms of the rise of multi-cloud where you’re dealing with cloud complexity management or complexity mediation, your ability to look at platforms and data and processes and then put them into a configurable domain, in essence, put volatility into a domain.

[00:14:17]

And that’s an easy way to look at some very hard work that needs to occur within these enterprises.

Small vs. Large Models for AI Applications

Small versus large models. Everybody wants to build large models. And as I mentioned here before, all enterprises do not need to build LLMs. They’re hugely expensive to build. They’re gonna take a huge amount of data. And I think most of the AI applications that are either agentic or generative or traditional machine learning are going to be tactical uses of AI. So in other words, it’s gonna be inventory control. It’s gonna be sales order entry systems that are based on AI, but they’re not solving these huge problems within the enterprise.

[00:15:00]

They’re solving very tactical problems. And these are things we can normally build within three to six months. And that’s kind of a doable domain. Normally it doesn’t take a lot of money to get there. These things don’t cost a lot to operate. They don’t necessarily need GPUs or big honking processors to do the inference processing, things like that. So I think your ability to think a bit smaller and a bit leaner in terms of how you’re gonna leverage AI will go to making AI more successful to you. And I think in doing that, we’re gonna find we’re able to take small domains of data. And not necessarily the holistic data, which is the whole lot of the enterprise. It’s gonna be very hard, if not impossible to do. And I think that a lot of success is gonna be driven within enterprises in building these tactical systems and using these smaller data sets.

Solving the Data Problem

So where do we go from here? Is all hope lost? No, it can be solved.

[00:16:00]

It’s gonna take some spending. It’s gonna take some strategic thinking by the leaders in the enterprises. And I think you have to have folks who kind of think out of the box in terms of how you’re going to make this work, your ability to get the talent you need. You may not be able to do this with the people existing in your organization. You’re gonna need data architects, data scientists, middleware engineers, cloud engineers, AI engineers, all the smart people that are able to look at where your as-is state is, figure out where your to-be state needs to be in terms of your data and map the path to get there. I’m sure you may not like that path because it’s gonna be very expensive and it’s gonna be very long, but I don’t think you have an option.

Data Problem as a Non-Optional Challenge

You’re gonna have to solve the data problem in order to get to the value of other systems, inclusive of AI, and so it’s not optional. It’s something you’re gonna have to do. So architectural decisions, strategic decisions, and address the challenges. That’s what it is at the end of the day. And by the way, this is nothing new. We’ve been talking about data issues for years.

[00:17:02]

And certainly, as new technology starts to come in there, the data issue starts to raise its head a bit more.

AI’s Dependence on Data

The difference is AI is a different beast, different animal. It’s completely dependent on data. It’s not cloud computing. We’re just moving a dysfunctional data set from on-premises into the cloud, and we’re just transferring a problem, not necessarily solving it. But this is the point where we’re gonna be unable to build these AI systems unless the data problem’s solved.

Solving the Data Problem is Not Optional

So this is something that is not optional. Enterprises are gonna have to do it. So best of luck. So that’s all I have for this week.

Conclusion and Call to Action

Don’t forget to like and subscribe. Also, check out my work on theCUBE Research. We’re doing a lot of great stuff out there. Check out my colleagues, their podcasts, their YouTube channels. Certainly lots of great analysis, and I love their analysis because they’re not just spouting off opinions. They actually have data to back up the opinions, which is important.

[00:18:02]

So until next time, you guys stay safe. Cheers, bye.