I sat down with Pantera Digital's Vice President of Engineered Solutions Darren Livingston and Senior Director of AI and AppDev Dr. Jonathan Gough to discuss how they've managed to bridge the gap between infrastructure and artificial intelligence (AI) when designing and implementing AI solutions for Pantera Digital's clients. The wide-ranging discussion begins with a brief back-and-forth on running AI workloads in the cloud versus on premises and then moves into a description of how they've managed to bring together infrastructure and data science perspectives.
High-value workloads are increasingly common, and the trend points toward AI being literally embedded in everything. A lot of those workloads are more cost-effective to run on premises, where there are tighter controls and stronger security. Many organizations remain concerned about their data — and as AI legislation and governance continue to evolve, having tighter control, knowing what's happening, and being able to monitor it on prem is the direction many enterprises are taking.
It's also worth noting how teams are leveraging VMware and various hyperconverged infrastructures. The principle is simple: use only the right amount of hardware to get the job done. The goal is running workloads that are sustainable, cost-effective, and that deliver real business value. If your workload costs more than the dollars it saves or generates, you're doing it wrong.
The first step in bridging infrastructure and AI is understanding each other's needs and finding a common solution. For infrastructure leaders, this means learning the language of data science and finding ways to optimize performance — whether on-prem or in the cloud. The decisions teams make for performance, security, or ease of operations are influenced by a multitude of factors, and without a collaborative approach, it's like speaking two different languages.
The way data scientists and machine learning experts use compute is fundamentally different from how traditional business applications use it. This is a rapidly evolving landscape, and it really does take a team of people working together — data scientists, machine learning engineers, and the digital infrastructure team partnering, listening, and collaborating to put together the right solution for the problem at hand.
A useful analogy: somebody says "dig a hole." I can do it — but do I need a shovel, a spade, or a backhoe? How much do I need, and how long do I need it? Doing that effectively is critical for the infrastructure team to carve up and serve up exactly what the workload requires. It takes effort for the two teams to learn how to speak the same language.
A year and a half ago, ChatGPT was released alongside other open-source models of similar size. Initially, those models were difficult to power and required significant compute. But things changed within weeks. What once took four GPUs to run a single model could, just four to six weeks later, run on one or two GPUs because people got inventive and creative when they saw the problem. That's not how traditional infrastructure works — you don't see changes month over month or week over week. Changes typically happen quarter over quarter or year over year. In the infrastructure world that's an enormous shift, and it makes the partnership between AI and infrastructure teams more critical than ever to design and implement the right solution.
This article was originally published on the Pantera Digital blog.