Intel’s Habana Labs announces turnkey AI training solution

Habana Labs, an Intel Company, has announced the availability of an enterprise-class turnkey artificial intelligence (AI) training solution featuring the Supermicro X12 Gaudi AI training server and DDN AI400X2 storage system. With eight Habana Gaudi purpose-built AI processors, the Supermicro X12 Gaudi AI server provides customers with highly cost-efficient AI training, ease of use and system scalability.

“We are pleased to support our customers with this new turnkey solution that brings the efficiency of the Supermicro X12 Gaudi AI server together with the data management and storage performance of the DDN AI400X2 system to augment utilization of AI-compute capacity and enable us to address this growing need in training deep-learning models,” said Eitan Medina, CBO with Habana Labs.

As datasets become larger and AI models grow in complexity, demand for AI training increases. Having AI training systems that incorporate cost-efficient data management and storage is key to ensuring customers can optimize AI workload productivity and efficiency, enabling them to achieve desired insights and accuracy.

“The Habana team is committed to bringing Gaudi’s price performance, usability and scalability to enterprise AI customers who need more cost-effective AI training solutions,” added Medina.

The technology
The Supermicro X12 Gaudi AI server, designed to address the high costs associated with implementing AI/machine learning (ML), is integrated with the DDN AI400X2, performance-driven storage appliance. With the integrated solution, customers requiring enterprise-class cost-effective AI training systems with enhanced data management and storage can train more and spend less.

The turnkey AI solution comes pre-configured with one, two and four server options to address AI training capacity requirements. The Supermicro X12 Gaudi AI server features eight Gaudi HL-205 mezzanine cards, dual 3rd Gen Intel Xeon Scalable processors, two PCIe Gen 4 switches, four hot swappable NVMe/SATA drives, fully redundant power supplies, and 24 x 100GbE RDMA (6 QSFP-DDs) — resulting in near-linear system scale-out.

The system contains up to 8TB of DDR4-3200MHz memory, unlocking the Gaudi AI processors’ full potential. The HL-205 is OCP-OAM (Open Compute Project Accelerator Module) specification-compliant. Each Gaudi incorporates 32GB HBM2 on-chip memory.

The scalable architectures of the Supermicro X12 Gaudi AI server and DDN AI400X2 appliance make it easy to expand to larger clusters so customers can scale AI training infrastructure as capacity requirements increase.

The new training solution is validated with the Habana Gaudi SynapseAI software platform and workloads running with Habana’s optimized TensorFlow and PyTorch Docker container images from the Habana Software Vault. Data scientists and developers can seamlessly start building new models or migrating existing models for Gaudi with the Habana developer site and reference models on Habana’s GitHub repositories.