Building Batch Data Pipelines on GCP
Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud Platform for data transformation including BigQuery, executing Spark on Cloud Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Cloud Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud Platform using QwikLabs.
What you'll learn
Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud Platform for data transformation including BigQuery, executing Spark on Cloud Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Cloud Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud Platform using QwikLabs.
Table of contents
- The Hadoop ecosystem 9m
- Running Hadoop on Cloud Dataproc 11m
- GCS instead of HDFS 6m
- Optimizing Dataproc 5m
- Optimizing Dataproc Storage 9m
- Optimizing Dataproc Templates and Autoscaling 4m
- Optimizing Dataproc Monitoring 4m
- Lab Intro:Running Apache Spark jobs on Cloud Dataproc 0m
- Lab: Running Apache Spark jobs on Cloud Dataproc 0m
- Summary 1m
- Introduction 8m
- Components of Data Fusion 2m
- Building a Pipeline 6m
- Exploring Data using Wrangler 2m
- Lab:Building and executing a pipeline graph in Cloud Data Fusion 0m
- Lab: Building and Executing a Pipeline Graph with Data Fusion 0m
- Orchestrating work between GCP services with Cloud Composer 2m
- Apache Airflow Environment 2m
- DAGs and Operators 12m
- Workflow scheduling 7m
- Monitoring and Logging 5m
- Lab:An Introduction to Cloud Composer 0m
- Lab: An Introduction to Cloud Composer 0m
- Cloud Dataflow 8m
- Why customers value Dataflow 4m
- Building Cloud Dataflow Pipelines in code 4m
- Key considerations with designing pipelines 2m
- Transforming data with PTransforms 3m
- Lab:Building a Simple Dataflow Pipeline 0m
- Lab: Serverless Data Analysis with Dataflow: A Simple Dataflow Pipeline (Java) 0m
- Lab: Serverless Data Analysis with Dataflow: A Simple Dataflow Pipeline (Python) 0m
- Aggregating with GroupByKey and Combine 7m
- Lab:MapReduce in Cloud Dataflow 0m
- Lab: Serverless Data Analysis with Dataflow : MapReduce in Dataflow (Java) 0m
- Lab: Serverless Data Analysis with Dataflow: MapReduce in Dataflow (Python) 0m
- Side Inputs and Windows of data 4m
- Lab:Practicing Pipeline Side Inputs 0m
- Lab: Serverless Data Analysis with Dataflow: Side Inputs (Python) 0m
- Lab: Serverless Data Analysis with Dataflow: Side Inputs (Java) 0m
- Creating and re-using Pipeline Templates 4m
- Cloud Dataflow SQL pipelines 3m