New open source libraries from Nvidia provide GPU acceleration of data analytics an machine learning. Company claims 50x speed-ups over CPU-only implementations.
By Andrew Brust for Big on Data | October 10, 2018
At a keynote at the GPU Technology Conference in Munich today, Nvidia, the video/graphics company turned Artificial Intelligence (AI) juggernaut, is today going another step forward in the AI direction.
This time though, Nvidia isn’t announcing a new Graphics Processing Unit (GPU) platform, or a new proprietary SDK for deep learning, but is instead announcing new a set of new open source libraries for GPU-accelerated analytics and machine learning (ML).
Rapid AI movement
Dubbed RAPIDS, the new library set will offer Python interfaces similar to those provided by Scikit Learn and Pandas, but which will leverage the company’s CUDA platform for acceleration across one or multiple GPUs.
According to Jeff Tseng, Nvidia’s Head of AI Infrastructure, who briefed a number of technology journalists by phone on Tuesday, Nvidia has seen 50x speed up in training times when using RAPIDS versus a CPU-only implementation. (This speed up was measured in scenarios involving the XGBoost ML algorithm on an Nvidia DGX-2 system, though the CPU hardware configuration was not explicitly discussed.)
Integrations and partners
RAPIDS apparently incorporates in-memory columnar data technology Apache Arrow, and is designed to run on Apache Spark. With the latter in mind, the company has logically garnered support from Databricks, which will integrate RAPIDS into its own analytics and AI platform.
RAPIDS should be available by the time you read this in both source code and Docker container form, from the RAPIDS Web site and the NVIDIA GPU Cloud container registry, respectively.