Suresh Andani, AMD | Google Cloud Next ’24
[Rebecca Knight]
Introduction
Hello everyone, and welcome back to day one of theCUBE’s live coverage of Google Cloud Next here in Las Vegas, Nevada. I’m your host, Rebecca Knight, along with my co-host, Rob Stretchay. Rob, there has been a dizzying number of new product announcements, new cloud announcements, updates.
Hardware is Having a Moment
It’s really quite incredible.
[Rob Stretchay]
I think what’s awesome about it is that certain things that are old are new again. And I think one of those is what we’re going to talk about today. And really, the key components, because funny enough, even though it’s called cloud, it doesn’t really run in air. It runs on hardware. And I think, again, it’s getting back to those fundamentals and the use cases of it is really key.
[Rebecca Knight]
Introducing Suresh Andani
The perfect segue to introduce our next guest, and that is Suresh Andani. He is the Senior Director of Cloud Product Management at AMD. Thank you so much for coming on the show.
[Suresh Andani]
Thank you, Rebecca. Thanks for having me.
[Rebecca Knight]
So, as Rob was just saying, hardware is really having a moment, we shall say.
[00:01:01]
The Semiconductor Business is Exciting
So, what do you make of this moment? What are you seeing out there? Just riff a little bit on that for our viewers.
[Suresh Andani]
Yeah, no, absolutely. It’s a very exciting time to be in the semiconductor business like we are here at AMD. A lot of new use cases, and, of course, I would be remiss if I don’t bring up generative AI and large language models. The algorithms were there before, but we didn’t have the compute power in the past collectively as an industry. But with some of the latest advancements, we and our competitors also have made in the semiconductor space. They pack in more and more compute, whether it’s your server CPU or if it’s your large LLM AI-focused GPUs. It’s a very exciting time, for sure.
[Rob Stretchay]
Silicon is Sexy Again
Yeah, and I think what’s interesting about silicon being sexy again is the fact that it’s not only that, it continues to grow.
[00:02:00]
I mean, the density that AMD is getting on the chips, like 160 plus cores on a chip, and being able to bring that to the right place. But it’s not only about CPUs, it’s GPUs and the right workloads, right?
CPUs vs. GPUs for AI Workloads
I mean, what are the differences and what are you seeing when you’re talking to customers about that?
[Suresh Andani]
Yeah, so that’s a very good question, Rob. There’s a lot of buzz around AI being run on large language models, LLMs, gen AI, and everything has to be processed on a GPU. But the reality is, there’s a set of workloads where the data set sizes are so large that literally you do need offload accelerators, like our Instinct series of GPUs, the MI300, to really go do the training and inference with the lowest latency possible. But then there’s a lot of traditional foundational use cases which needs to be infused with AI, such as fraud detection, for example, for FSI, right?
[00:03:04]
Or a fintech company, or recommendation engines if you were a retailer on an e-commerce, right? And a lot of that can be actually run very efficiently on CPU, using CPU inferencing. Especially for edge use cases, where it’s hard, where the environment does not allow to put you servers that are like multiple kilowatts per server, CPU inferencing really is the right choice. So, at AMD we have a customer first approach, depending on your situation, what you are optimizing for, what environments you are deploying your AI hardware, we try to provide the right guidance, the right infrastructure, whether it’s a server CPU, like an Epic series, or if it’s an Instinct GPU.
[Rebecca Knight]
Customer First Approach
So let’s dive into that a little bit, because that’s interesting, this customer first approach, where you actually talk to the customer, what are your pain points, what are you trying to solve here? Do you have any examples of working with customers, and then the solutions that you opted for?
[00:04:03]
[Suresh Andani]
Yeah, so, I don’t know if I can say the names publicly, but I’ll kind of allude to, we have the biggest streaming, video streaming and content generation company that has adopted AMD instances in the cloud, and they were looking to really go and use our CPU instances, what they’re using for your general video transcoding and encoding your standard workloads, but now also infusing and using the same infrastructure in the cloud to do personalization and recommendation engines for content. Another example I can give is a big global payments company is using the same AMD infrastructure in the cloud to do fraud detection. So, again, there’s multiple customers who are leveraging just the CPU inferencing, you really achieve the end results they need to achieve. Now, as they are growing, the data sets are growing, they’re adding more subscribers, they’re getting more feeds from their Twitter and everywhere, right?
[00:05:05]
The data sets is increasing every day. There will be a point where, to really get the right insights, you will have to do some offload to the GPU accelerators, and we’re going to be ready for that because we have, again, what we believe is the best hardware on the GPU side to meet that. So, yeah, there are a lot of examples, real customer examples that are benefiting from our AI investments.
[Rob Stretchay]
Google Distributed Cloud
Yeah, and I think part of it is we’ve been talking to Google themselves, and we had a number of people on, Sachin was on earlier, talking about Google Distributed Cloud, and a lot of people are going to be running that on CPUs, not on GPUs because they can’t get GPUs in a lot of cases. Do you see that where, hey, as people bring AI to the data, that that is a really good use case for this?
[Suresh Andani]
Yeah, so think about a large enterprise or a SaaS provider who has all of its data already in the cloud for doing your standard workloads.
[00:06:05]
You have your databases, you have all your data, all your data is already in the cloud. Now you want to kind of gain and infuse AI and gain some insights of the data that you are using to run your core foundational workloads. So it’s much easier to bring AI to the data through the same CPU inferencing and really get the right insights from that data. So bringing AI to the data is a motion that we are seeing because your data is already in the cloud, you’re already invested, you’re already using the same CPU-based hardware for running your foundational workloads. Just infuse AI to it, all you need to do is take your data, curate it, you’re taking regular databases to vector databases, feeding it to the AI engines. That’s a lot easier to do that than to take data to the AI.
[Rob Stretchay]
Yeah, I mean data has gravity.
[Suresh Andani]
Data has a lot of gravity.
[Rob Stretchay]
Data Has Gravity
It definitely has gravity.
[Rebecca Knight]
And how aware are companies, would you say, of what this can do for them?
[00:07:05]
In the sense of the content company you were just talking about, that they needed it for the streaming, but then also to run the recommendation engines. Were they aware that they could extract these insights from the data?
Companies’ Awareness of AI Capabilities
Or are you also alerting them, notifying them that they can do more?
[Suresh Andani]
Yeah, so that’s a good question, Rebecca. A lot of these companies, like the big ones, they are very smart engineers. They really were doing this for many years now. What’s different now is as the hardware, as we innovate as AMD and bring more and more compute power to the cloud, what you can do with that data is a lot more than ever before. So if you take any content streaming app, one of our favorite ones, if you take the same title, based on your history of what you have watched, it shows you a different view of the title than it shows me because of my history, because I would connect more to that presentation.
[00:08:05]
[Rebecca Knight]
That is wild. Wow.
[Suresh Andani]
That’s what we call personalization. So new use cases are being enabled with advancements in compute power. And you want to do it at very low latency. And the number of cores we pack in, the amount of I.O. and memory bandwidth we provide that our cloud partners leverage makes it possible for those smart engineers in these end customer companies to leverage that. They know exactly how to do that. Our job is to just provide them with the tools.
[Rob Stretchay]
Sustainability and Power Efficiency
And it’s not only just about how powerful or how much faster they can go. There’s that aspect of the sustainability of this because they’re packing more into a smaller space. Talk to how somebody like a Google would look to you guys and say, hey, we want to use 60% less power. I’m just pulling that out of the air, but something to that effect. Yeah, Rob.
[Suresh Andani]
So you alluded earlier to like AMD is now pushing 160. It’s actually 192 cores per chip, right?
[00:09:00]
And soon going to 256 cores per chip. When you have so many cores and each core performing like 25, 30% better every generation, you’re packing a lot of compute power within a rack that enables you to consolidate not just the real estate in the data center, but more importantly, it helps you to consolidate the amount of wattage and the power available in the data centers. A lot of geos now are limited on how much power they can pump into a data center and how much they can cool. So the innovations we are doing from a CPU side and also on the GPU side enables much denser racks where you can consolidate a lot more processing power than you were able to do before. So it’s not just solving the, hey, we are giving you more compute power. It’s solving the bigger problem of data center space and power kind of limitations.
[Rebecca Knight]
Business and Technology Alignment
We just had a guest on from McKinsey who has a new book out that is really a playbook for helping companies figure out how and when to do their pilots, what will work in a pilot, what the incentives, how people need to be aligned.
[00:10:10]
I’m curious what you’re finding when you are talking to your customers. You say you take a customer first approach, but are the business and the technologists aligned? And how are companies making space for AI within their technology budgets?
[Suresh Andani]
Making Space for AI in Technology Budgets
Yeah, so that’s, I don’t know, I should say a million or a billion dollar question.
[Rob Stretchay]
It’s a lot.
[Suresh Andani]
But yeah, a lot of us think about AI as this matrix multiplication where you are doing SIMD instructions very fast parallelly to really go do your inferencing and training. But if you think about the whole AI, it starts from data ingestion, data curation, data serving, ETL, ELT for AI, and then feeding it to the AI engine. And really making space for that includes not just the AI, but also your, like all the database, data curation, the data pipelining and all that ETL and ELT stuff.
[00:11:09]
And the more efficiently you can do that preprocessing of the data that you feed to the AI engines, the more space and power you can allocate and make space for AI, so to speak. So a lot of where we focus, like if you look at our general purpose CPUs, we don’t pack in a lot of accelerators, hardware accelerators, just for AI, because it comes at a cost of not being able to do all the data pipelining and all that stuff, right? For us, the AI would be as slow as the slowest part of that pipeline, it’s Amdahl’s Law, right? So what we are trying to kind of optimize is the entire pipeline, and packing in more cores, each core being more performant than the previous gen in our competition, really the feedback we are getting from customers is the right way to go.
[00:12:01]
So making space for AI through kind of helping on the foundational workloads, feeding the AI with the whole pipeline that we want to optimize, not just one piece of it through some special accelerators.
[Rob Stretchay]
Working with Google Cloud
So what is it like to work with, say, Google from a cloud perspective? It’s very different than some of your colleagues that may work with the OEMs and things of that nature, where those are going off into data centers and being managed locally. How do you have to think differently about working with a hyperscaler like Google?
[Suresh Andani]
Yeah, so obviously multi-tenancy is a big kind of, not a concern, but a feature rather. So we need to make sure, A, first of all, they are renting, it’s like a timeshare that they are kind of renting out to the end customers, right? So they would come in, customers would use it, they would get out, right? And then the same hardware is being used for some other customer in a different month or a different time slice.
[00:13:00]
So really the architecture of what is, we have to be, we work very closely with Google to really, every new generation we go and launch, is how can they take advantage of hundreds of codes now that we are packing within one chip, feeding the beast with the right amount of memory bandwidth, IO bandwidth, and really setting the TDPs off at a rack level to a point where you can still cool the infrastructure while not compromising. It’s a lot of sophisticated engineering that goes behind it, and we work very closely. That’s just general compute. Add to that high-performance computing, where now you have to have your networking to be very fast and efficient and low jitter, low latency. And then layered on top of that, many other kind of features like confidential compute we put for very regulated industries. And now very soon it becomes like a pretty sophisticated infrastructure where one side does not fit all.
[00:14:05]
You have a different infrastructure for high-performance compute, different infrastructure for general-purpose compute, and now some special features like confidential compute for regulated industries. So we start anywhere like between three years before the launch of the instance with partners like Google Cloud, and it’s a journey that we take to get the best-performing, perf-per-dollar instances, best features out there.
[Rob Stretchay]
CPUs and GPUs vs. Purpose-Built Components
Yeah, and we were talking beforehand. I mean, there’s some companies that are going down and building special-purpose ASICs and things of that nature. To me, I was at a company we used FPGAs because you could reprogram them and stuff like that, but it came at a cost, and they were more expensive. What is it, do you think, that, I mean, to me, it seems like CPUs and GPUs make way more sense than purpose-built components, especially with AI.
[00:15:00]
What are you seeing?
[Suresh Andani]
Yeah, so generally programmable solutions like FPGAs, CPUs, GPUs, gives you the best dynamic elasticity because these, talk about AI, right? That’s the talk of the town. The models are changing. The frameworks are changing. It’s such an evolving area that every time you wake up in the morning, something has changed. Now, if you are in that dynamic world, you want to have an infrastructure that you can program from scratch and really optimize, and that’s where FPGAs, CPUs, and GPUs are amazing, right? But once your models have stabilized and you’re on your cost optimization, then in certain use cases where it’s fixed, you can get into more basic kind of options, right, which has its space of its own. But today in the world of AI, so much is changing on a daily basis that that’s why everybody’s using CPUs for inferencing and GPUs for training and somewhere in between.
[00:16:07]
[Rebecca Knight]
Rapid Change in the AI Industry
I’m interested in what you just said, and we know that this is an industry where there is such rapid change, really on a daily basis. We had Jamie Dimon yesterday saying that this is really, it’s going to be more disruptive than the internet. You’re a technology veteran. You’ve been in this industry for a long time. Is that true? Because part of me, I’m a future of work journalist, feels, is it a little overhyped? I mean, here we are, but at the same time, you’re really in this. I’m curious.
[Suresh Andani]
Yeah, and I get this question a lot, and there’s always, there were hype cycles in the past. We all know about the dot-com, and we all know about, AI is not the first time we’re hearing it. It came and went, and it came and went, right? Right now, why I feel it’s not just a hype is we have the compute infrastructure to support the vision. There are use cases.
[00:17:01]
You guys have all seen GPD, GPD 4. There’s always a little bit of hype into everything, right? And we need that to create like, but I think it’s a lot more real than it is hype. As you said, I’m in the middle of talking to a lot of customers. I gave a couple of examples of how quickly these customers have leveraged AI in their day-to-day use cases and kind of enhanced their end customer experience. And really, the demand that we are seeing is just unprecedented. And we are kind of at the starting kind of point of that because we provide the GPUs and the CPUs. That goes into OEMs and cloud providers who sell into enterprises and digital native SaaS companies. So we are seeing the first wave of real demand, which is continuing to grow.
Real Demand for AI Hardware
We are very excited. I’ll say public things, right? When we announced our GPUs in December, we estimated about two billion sales first year.
[00:18:01]
The fastest product, GPU product, to go like one billion dollars, right, for AMD. Three weeks after that, when we did our first quarter announcement, Lisa, our CEO, Dr. Lisa Hsu, like upped that to three and a half billion dollars. So it’s kind of like, it’s real. It’s as real as I’ve seen, at least.
[Rob Stretchay]
Open Sourcing and Collaboration
Yeah, and I think what’s neat, and I think you kind of hit on this, and you’re part of the AI alliance as well, which is more looking at it from an open perspective. And I think, to me, open tends to win from these things. Absolutely. And I’m sure that’s what you’re seeing with Google and others is that, hey, really, because we’re open and we can support these different workloads, and we have different, and we listen to the customer and work backwards in that way, that is a big piece of why they want to work with you as well.
[Suresh Andani]
Yeah, open sourcing of our software is a huge, huge advantage, at least that we believe in what we’re hearing from our top tier one customers, because it’s impossible for one company in the world to have all the AI experts under one roof.
[00:19:03]
So if all the innovation is happening under one roof, you’re not innovating a lot. Once you open the whole ecosystem like we are, and it’s not just our RockM software, it’s everything, right? Our libraries, our software, our distros of all the, what do you call it, the plug-ins, and everything is all open source. And that’s a huge tailwind for us that our customers are really appreciating. So yeah, that’s a key differentiator for us.
Monetizing AI Investments
And just to answer the previous question, I think it’s less of a more real, I think what we need to figure out as an ecosystem, as an industry, is how would the end enterprises monetize their AI investments with their end customers, which are likely either other companies or consumers, right? So that part has to be figured out. People are seeing the vision, but translating that to business and monetization, is what we need to kind of, did the content company through investing in AI get more subscribers just because their interface was much more personalized?
[00:20:15]
That data points need to be created to really calculate the ROI on these investments. We all believe and can feel it’s there, just needs to be quantified.
[Rebecca Knight]
Quantifying the ROI of AI
Right, needs to be quantified.
Conclusion
That’s a great note to end on. Thank you so much for joining us, Suresh.
[Suresh Andani]
This was a really great conversation. My pleasure, Rebecca. Thank you, Don.
[Rebecca Knight]
I’m Rebecca Knight for Rob Strech. Stay tuned for more of theCUBE’s live coverage of Google Cloud Next. You are watching theCUBE, the leading source of enterprise news.