2024 Compute Performance Considerations and Expectations

Large language models compute loads captures a lot of focus, following an “AI on Top of HPC” architecture, LLMs became feasible, and are now, maybe, the world’s major technology and business competency arena.

Typical HPC clusters utilized in LLMs training has 10s to 1000s of compute nodes, making the long training jobs feasible. The majority of this arena’s spotlights are focused upon the processing units, primarily GPUs, which provide an average of 60 TFLOPS of FP64, though, such HPC clusters usually configured with dual sockets CPUs providing an average of 8 TFLOPS , some high speed RAM, high speed network (100 – 400 Gbps) + multiple NICs per node, and High Performance Storage (usually NAS).

Now the inevitable question of compute efficiency of these modern “super” systems reveals a lot of surprises, highlighted in the following lines.

A study by Google suggests that the actual load of BERT¹ on a GPU is around 10-20% of the peak FLOPS capability of the GPU.

Another study by Facebook AI suggests the actual load of RoBERTa² on a GPU is around 20-30% of the peak FLOPS capability of the GPU.

A closer look at LLMs training and inference indicates that these are data-centric workloads, the poor utilization of the GPUs by these models, clearly implies a need for higher rates data access and transfer technologies, rather than increasing the compute capability. In other words, a higher ROI of a compute facility running data-centric workloads, can be achieved by increasing the investment in data access and transfer capabilities, specifically, the memory and the computer architecture, instead of investing in additional “unusable” compute capability.

Surprisingly, this means:

Development of (an order of magnitude) faster memory systems is what the industry must expect in the near future
Currently, Modified Harvard architecture (a little different than Von Neumann architecture), which is the most adopted computer architecture, does not suit modern data-centric workloads, however, modern compute requirements clearly necessitate a technology development shift, towards a “Data Flow/In Memory Computing” as an alternative architecture for such workloads.
Development and innovation of new hardware processing platforms, along with their enablement by software and programming models, will be increasing, as much newer workloads do evolve.
The chipmakers market landscape is about to change, the domination of GPUs as AI processing units is not going to last for long, although, it is currently the best fit.
In the short term, the compute capability differences between the various GPU platforms currently available in the market might soon be realized as insignificant, specifically comparing Nvidia, AMD, and Intel datacenter GPUs.

Although, above GPU market expectations might be considered as bold, and seems to suggest a bubble approach, it is worth preparing for.

———————————————————————————-

¹: BERT (Bidirectional Encoder Representations from Transformers), is a popular LLM that has been shown to achieve state-of-the-art results on a wide range of NLP tasks.

²: RoBERTa (Robustly Optimized BERT Pretraining Approach): RoBERTa is a variant of BERT that was specifically designed for text classification tasks

Author:
Tamer Assad Hassan Mahmoud
HPC & Media Streaming Consultant
CEO of PHOTON COMPUTING LLC
LinkedIn: https://www.linkedin.com/in/tamerassad
https://www.photon-computing.com

2024 Compute Performance Considerations and Expectations

Related

AutoRabit Secures $26 Million in Series B Financing; Plans on Building around AI to Enhance Automation

Beyond Compliance: Innovative Strategies for Data Privacy Success

Sports Taking the Tech Route

CalAmp Smart Tag and App Solution Delivers Intelligent Telematics

As cyberattacks cause more devastation and destruction, a new catchphrase continues to propagate throughout cybersecurity: Zero Trust.

Pharma Leaders to Explore New Technologies and Digital Innovations at AUTOMA+ 2024

Rethinking the Streaming Space

Insure on to Offer its Rating and Quoting Technology to Insurance Brokerages

A Long-Awaited Crypto Entry

GreyNoise Raises $15 Million in Series A Financing; Plans to Improve its Cybersecurity Technology for a More Productive Security Function

Latest

Turning to an Agile Yet Highly Comprehensive Take on Loan Origination

Unveiling a Unique Attempt to Transform the Course of Video Streaming and Broadcast Delivery

An Assorted Attempt to Make Your Contact Center Operations Access Unprecedented Performance

Unveiling a WhatsApp-focused Innovation to Empower Business’ Pursuit of Customer Acquisition and Engagement

Betting Bigger on AI to Address the Lopsided World of Modern Employment Requirements

No posts to display

No posts to display