Kubeflow Pipelines With GPUs

Compute-intensive DL and ML workloads, from fraud detection in banking to video recommendation on streaming services, require frequent training and inference at scale. Kubeflow is an end-to-end platform built on top of Kubernetes for Machine Learning and Deep Learning model training and deployment. Launched in 2018 as an open source project by Google, Kubeflow quickly became one of the most popular hybrid-cloud ML toolkits across industries with an actively-growing contributor base.
Kubeflow seeks to accomplish the following goals:
- Extend the underlying Kubernetes infrastructure to make ML/DL deployments easy, repeatable, and portable.
- Allow researchers to focus on rapid experimentation with shared notebooks and data without worrying about the underlying infrastructure.
- Provide the ability to run training and inference either on-prem or in the cloud, with minimal changes to the code base.

Kubeflow Pipelines
A typical data science workflow usually includes stages such as data verification, feature engineering, model training, and deployment in a scalable fashion. Containers enable easier deployment as data scientists and ML engineers can nicely package their code and port it across compute environments without worrying about dependencies. Kubernetes allows repeatable batch job creation, efficient tracking and monitoring of nodes, and intuitive ways to scale-out.
But how can we make the workflow modular, repeatable, portable, and shareable with other members in a data science team? Enter Kubeflow Pipelines.
Containers form the basic building blocks of the Kubeflow pipeline. Each contains a script corresponding to steps in the pipeline — as in preprocess, train, and serve — with input and output paths specified as command line arguments. A pipeline definition file, written either in a Python script or as a Jupyter Notebook, glues the individual components to form a graph. Specifically, it defines the pipeline’s parameters and each component’s inputs, outputs, and relationship to other components.
The Pipelines UI is a dashboard which enables the upload of a compiled pipeline in order to visualize and run the graph. Different runs can be configured by altering the pipeline parameters, allowing an in-depth analysis of training trials, and efficient versioning of models that are being served.
GPU-accelerated Kubeflow Pipeline
Since Kubeflow has become a popular framework for such requirements across enterprises, NVIDIA works closely with the community to integrate GPU technologies into the ecosystem.
Running GPU-accelerated Kubeflow Pipelines isn’t hard. Check out this tutorial on how to train a Resnet-50 Model in Keras on the CIFAR-10 dataset. This should help you get started. Clone a copy and start experimenting today!
Here are some components integrated into the pipeline to make using GPUs easier and more effective:
- TensorFlow containers from (NGC): We have all struggled with getting the environment right for development. Getting the required DL frameworks, libraries, and GPU drivers can be challenging and time-consuming. Thus, NGC provides, among many solutions, highly optimized and purpose-built containers that can run from your PC, DGX Station, or the Cloud.
- TensorRT: After the training process, it is common to optimize the model for inference in order to reduce latency and increase throughput, especially in the production environment. TensorRT brings several optimizations such as layer & tensor fusion, precision calibration, and kernel auto-tuning.
- TensorRT Inference Server: Deploying models at scale with GPUs requires distributing the workload flexibly and evenly among processing units. TensorRT inference server is a containerized, production-ready software server for data center deployment with multi-model, muti-framework support. Through efficient dynamic batching of client requests, TensorRT Inference Server provides the capability to handle massive amounts of incoming requests and intelligently balance the load between GPUs.
Kubeflow Pipeline tutorial overview
Let’s now look at how the technologies above integrate with our Kubeflow Pipeline example.
Preprocessing container:
- Built on top of TensorFlow container (19.03-py3) from NGC as the base image.
- The entrypoint script loads the CIFAR-10 dataset that comes with the Keras library.
- The images are rotated, scaled and cropped, and saved in the output directory.
Training container:
- Also built on top of TensorFlow container (19.03-py3) as the base image.
- The entrypoint script trains a ResNet-50 model on the preprocessed data that was mounted in a specified volume parameter for the pipeline.
- Then, the resulting Keras h5 model file is converted into a SavedModel file and is optimized using TensorRT integration in TensorFlow (TF-TRT) to leverage Tensor Cores for accelerated inference.
Serving container
- The container launches a Kubernetes service that then deploys TensorRT Inference Server running inside the NGC TensorRT Inference Server container (19.03-py3) to serve the trained model.
Web UI container
- This container also launches a Kubernetes service that deploys a Web UI for interactive evaluation of the model.
Where can you go from here?
This article examined the GPU-accelerated Kubeflow Pipeline and how its adoption can greatly improve the modularity and performance of your data science workflow. You can now quickly and easily prototype and deploy powerful models in Computer Vision, NLP, or Recommendation, either on premises or on the cloud.
References
[2] NVIDIA Resnet Kubeflow Pipeline
[3] NVIDIA GPU Cloud
[4] TensorRT [Blog] [Documentation]
[5] TensorRT Inference Server [Blog] [Documentation]
Authors
Ananth Sankarasubramanian, NVIDIA Solutions Architect
Khoa Ho, NVIDIA Solutions Architect