Course
Skills
Production Machine Learning Systems
This course covers how to implement the various flavors of production ML systems— static, dynamic, and continuous training; static and dynamic inference; and batch and online processing.
What you'll learn
This course covers how to implement the various flavors of production ML systems— static, dynamic, and continuous training; static and dynamic inference; and batch and online processing. You delve into TensorFlow abstraction levels, the various options for doing distributed training, and how to write distributed training models with custom estimators.
Table of contents
Introduction to Advanced Machine Learning on Google Cloud
4mins
Architecting Production ML Systems
35mins
- Architecting ML systems 2m
- Data extraction, analysis, and preparation 5m
- Model training, evaluation, and validation 2m
- Trained model, prediction service, and performance monitoring 3m
- Training design decisions 5m
- Serving design decisions 6m
- Designing from scratch 3m
- Using Vertex AI 9m
- Lab introduction: Structured data prediction using Cloud AI platform 0m
- Lab: Structured data prediction using Vertex AI 0m
- Readings: Architecting production ML systems 0m
Designing Adaptable ML Systems
56mins
- Introduction 3m
- Adapting to data 3m
- Changing distributions 4m
- Lab: Adapting to data 2m
- Right and wrong decisions 4m
- System failure 2m
- Concept drift 9m
- Actions to mitigate concept drift 3m
- TensorFlow data validation 4m
- Components of TensorFlow data validation 5m
- Lab Introduction: Introduction to TensorFlow data validation 0m
- Lab: Introduction to TensorFlow Data Validation 0m
- Lab Introduction: Advanced Visualizations with TensorFlow data validation 1m
- Lab: Advanced Visualizations with TensorFlow Data Validation 0m
- Mitigating training-serving skew through design 2m
- Lab Introduction: Serving ML predictions in batch and real-tIme 1m
- Lab: Serving ML Predictions in Batch and Real Time 0m
- Lab Debrief: Serving ML predictions in batch and real-time 9m
- Diagnosing a production model 4m
- Readings: Designing adaptable ML systems 0m
Designing High-Performance ML Systems
43mins
- Introduction 1m
- Training 6m
- Predictions 3m
- Why distributed training is needed 3m
- Distributed training architectures 8m
- TensorFlow distributed training strategies 1m
- Mirrored strategy 3m
- Multi-worker mirrored strategy 4m
- TPU strategy 2m
- Parameter server strategy 2m
- Lab Introduction: Distributed training with Keras 0m
- Lab: Distributed Training with Keras 0m
- Lab Introduction: Distributed training using GPUs on Google Cloud’s AI Platform (Multi-Worker) 1m
- Lab: Distributed Training using GPUs on Cloud AI Platform 0m
- Training on large datasets with tf.data API 5m
- Lab Introduction: TPU-speed data pipelines 0m
- Lab: TPU-speed Data Pipelines 0m
- Inference 4m
- Readings: Designing high-performance ML systems 0m
Building Hybrid ML Systems
23mins