Building Batch Data Pipelines on Google Cloud
Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data.
What you'll learn
Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.
Table of contents
- Module introduction 1m
- The Hadoop ecosystem 5m
- Running Hadoop on Dataproc 10m
- Cloud Storage instead of HDFS 6m
- Optimizing Dataproc 3m
- Optimizing Dataproc storage 9m
- Optimizing Dataproc templates and autoscaling 5m
- Optimizing Dataproc monitoring 3m
- Lab Intro: Running Apache Spark jobs on Dataproc 0m
- Getting Started with GCP and Qwiklabs 4m
- Lab: Running Apache Spark jobs on Cloud Dataproc 0m
- Summary 1m
- Module introduction 1m
- Introduction to Dataflow 6m
- Why customers value Dataflow 3m
- Building Dataflow pipelines in code 4m
- Key considerations with designing pipelines 2m
- Transforming data with PTransforms 3m
- Lab Intro: Building a Simple Dataflow Pipeline 0m
- Lab: A simple Dataflow pipeline (Python) 2.5 0m
- Aggregate with GroupByKey and Combine 6m
- Lab Intro: MapReduce in Dataflow 0m
- Lab: MapReduce in Dataflow (Python) 2.5 0m
- Side inputs and windows of data 5m
- Lab Intro: Practicing Pipeline Side Inputs 0m
- Lab: Serverless Data Analysis with Dataflow: Side Inputs (Python) 0m
- Creating and re-using pipeline templates 4m
- Dataflow SQL pipelines 1m
- Summary 2m
- Module introduction 1m
- Introduction to Cloud Data Fusion 4m
- Components of Cloud Data Fusion 1m
- Cloud Data Fusion UI 2m
- Build a pipeline 5m
- Explore data using wrangler 2m
- Lab Intro: Building and executing a pipeline graph in Cloud Data Fusion 0m
- Lab: Building and Executing a Pipeline Graph with Data Fusion 2.5 0m
- Orchestrate work between Google Cloud services with Cloud Composer 1m
- Apache Airflow environment 1m
- DAGs and Operators 8m
- Workflow scheduling 6m
- Monitoring and Logging 3m
- Lab Intro: An Introduction to Cloud Composer 0m
- Lab: An Introduction to Cloud Composer 2.5 0m