Development
- Use Jupyter notebooks for development and experimentation
- Store prepared data and model in the same project
- Optimize performance and costs
Vertex AI Workbench Notebooks
- Use What-If-Tool (WIT)—Analyse models for bias
- Use Language Interpretability Tool (LIT)—Understand NLP model behaviour (visual interactive tool)
TensorBoard
- Find and compare experiments, e.g. based on hyperparameters
- Enterprise service—aimed at data scientists and ML researchers
- Collaboration
- Track, share and compare experiments
- e.g. Track loss and accuracy over time, visual model graph, project embeddings to a lower-dimensional space
Data Prep/Storage
- Prepare a good amount of training data
- Store tabular data in BigQuery
- Store unstructured data in Cloud Storage
- Combine images/video/audio clips into large files
- Improves read/write throughput
- Aim for files >= 100MiB, 100–10,000 shards
- Avoid storing data in block storage
Vertex AI Feature Store
- When training a new model:
- Search Vertex AI Feature Store for existing features meeting requirements
- Fetch features for training labels using Feature Store’s batch serving capability
- If no existing features, create new features
- Cloud Storage, BigQuery or raw data from data lake
- Create periodic job to merge into Feature Store
Processing
- Use TensorFlow Extended for TensorFlow projects
- Process tabular data with BigQuery
- Process unstructured data with:
- Dataflow—General use cases, Apache Bean
- Dataproc—Apache Hadoop or Spark use cases
- Link data to model with Vertex AI Managed Datasets
- Optional
- Clear link between data and custom trained models
- Automatic or manual splitting into train, test and validation data sets
Training
- Small datasets—Train in Jupyter notebooks
- Large datasets—Use Vertex AI Training Service
- Also use for prod, or small datasets with scheduled training
Hyperparameter Tuning
- Use Automated Model Enhancer—Removes need to manually adjust hyperparameters
Production
- Specify hardware appropriate for model
- Plan for additional inputs to model
- Enable autoscaling
References