MLCommons has published results of its industry AI performance benchmark, MLPerf Training 3.0, in which both the Habana Gaudi 2 deep-learning accelerator and the 4th Gen Intel Xeon Scalable processor delivered impressive training results.
“The latest MLPerf results published by MLCommons validates the TCO value Intel Xeon processors and Intel Gaudi deep learning accelerators provide to customers in the area of AI,” said Sandra Rivera, Intel executive VP and GM of the Data Center and AI Group.
She added: “Xeon’s built-in accelerators make it an ideal solution to run volume AI workloads on general-purpose processors, while Gaudi delivers competitive performance for large language models and generative AI. Intel’s scalable systems with optimized, easy-to-program open software lower the barrier for customers and partners to deploy a broad array of AI-based solutions in the data center from the cloud to the intelligent edge.”
Why it matters
The current industry narrative is that generative AI and large language models (LLMs) can run only on Nvidia GPUs. New data shows that Intel’s portfolio of AI solutions provides competitive and compelling options for customers looking to break free from closed ecosystems that limit efficiency and scale.
The latest MLPerf Training 3.0 results underscore the performance of Intel’s products on an array of deep learning models. The maturity of Gaudi2-based software and systems for training was demonstrated at scale on the large language model, GPT-3. Gaudi2 is one of only two semiconductor solutions to submit performance results to the benchmark for LLM training of GPT-3.
Gaudi2 also provides substantially competitive cost advantages to customers, both in server and system costs. The accelerator’s MLPerf-validated performance on GPT-3, computer vision and natural language models, plus upcoming software advances make Gaudi2 an extremely compelling price/performance alternative to Nvidia’s H100.
On the CPU front, the deep learning training performance of 4th Gen Xeon processors with Intel AI engines demonstrated that customers can build with Xeon-based servers a single universal AI system for data pre-processing, model training and deployment to deliver the right combination of AI performance, efficiency, accuracy and scalability.
The Habana Gaudi2 results
Training generative AI and large language models requires clusters of servers to meet massive compute requirements at scale. These MLPerf results provide tangible validation of Habana Gaudi2’s outstanding performance and efficient scalability on the most demanding model tested, the 175 billion parameter GPT-3.
Results highlights:
- Gaudi2 delivered impressive time-to-train on GPT-31: 311 minutes on 384 accelerators.
- Near-linear 95% scaling from 256 to 384 accelerators on GPT-3 model.
- Excellent training results on computer vision — ResNet-50 8 accelerators and Unet3D 8 accelerators — and natural language processing models — BERT 8 and 64 accelerators.
- Performance increases of 10% and 4%, respectively, for BERT and ResNet models as compared to the November submission, evidence of growing Gaudi2 software maturity.
- Gaudi2 results were submitted “out of the box,” meaning customers can achieve comparable performance results when implementing Gaudi2 on premise or in the cloud.
You may also like:
Filed Under: AI, Components, Microprocessors, News
Questions related to this article?
👉Ask and discuss on Electro-Tech-Online.com and EDAboard.com forums.
Tell Us What You Think!!
You must be logged in to post a comment.