Member-only story
A Step-by-Step Guide to Tuning a Model on Google Cloud’s Vertex AI
Tuning model hyperparameters, and visualizing metrics on managed Tensorboard

Vertex AI Tutorial Series
- A Step-by-Step Guide to Training a Model on Google Cloud’s Vertex AI
- A Step-by-Step Guide to Tuning a Model on Google Cloud’s Vertex AI (this article)
- How To Operationalize a Model on Google Cloud’s Vertex AI
- How To Use AutoML on Google Cloud’s Vertex AI
- How To Use BigQuery ML on Google Cloud’s Vertex AI
- How to Use Pipeline on Google Cloud’s Vertex AI
Background
In the previous article (first one of this series), we walked through the step-by-step instructions to have the first model trained on Vertex AI, Google Cloud’s newest integrated machine learning platform. The problem we were solving was an image classification task for the CIFAR10 dataset, which contains 60,000 32x32 images of ten classes.
In this article, we’ll build on that to improve the model performance and explore two very cool tools on Vertex AI: Hypertune and Experiments.
Optimization Idea
An observation from the previous article was that training for more epochs yielded better results. When training for five epochs locally, we got 60% evaluation accuracy. With 15 epochs on Vertex AI, we obtained 66% evaluation accuracy. Note that it’s usually better to use precision and recall as the performance metrics. But we are dealing with a perfectly balanced dataset. So we will stick to accuracy for simplicity.
The simplest and most brute-force idea is to train for more epochs. We set the epoch argument to 50 and launched a training job (refer to the previous article for how to do that). Unfortunately, the result was disappointing. The evaluation accuracy was 64%, which was lower than the result from the 15 epochs training. At the same time, the training accuracy shot up to 93%. Clearly, the model was overfitting the training data. So we’ll need some kind of regularization to generalize the model.