Site Reliability Engineering: Measuring and Managing Reliability
This course teaches the theory of Service Level Objectives (SLOs), a principled way of describing and measuring the desired reliability of a service. Upon completion, learners should be able to apply these principles to develop the first SLOs for services they are familiar with in their own organizations. Learners will also learn how to use Service Level Indicators (SLIs) to quantify reliability and Error Budgets to drive business decisions around engineering for greater reliability. The learner will understand the components of a meaningful SLI and walk through the process of developing SLIs and SLOs for an example service.
What you'll learn
This course teaches the theory of Service Level Objectives (SLOs), a principled way of describing and measuring the desired reliability of a service. Upon completion, learners should be able to apply these principles to develop the first SLOs for services they are familiar with in their own organizations. Learners will also learn how to use Service Level Indicators (SLIs) to quantify reliability and Error Budgets to drive business decisions around engineering for greater reliability. The learner will understand the components of a meaningful SLI and walk through the process of developing SLIs and SLOs for an example service.
Table of contents
- Course structure 3m
- What is SRE? How does it differ from DevOps? 0m
- What's the difference between Dev Ops and SRE 5m
- Who are CREs? How can they help you be more reliable? 0m
- Now SRE is everyone else with CRE 6m
- CRE's Three Reliability Principles 3m
- Reliability in the Cloud 3m
- How SLOs help your business make decisions 2m
- How SLOs help you build features faster 2m
- How SLOs help you balance operational and project work 2m
- Making SLOs work for your organization 1m
- Introduction 2m
- User happiness in metric form 2m
- The properties of good SLI metrics 4m
- Ways of measuring SLIs 4m
- The SLI menu 3m
- The SLI equation 2m
- Request / Response SLIs 6m
- Data processing SLIs 6m
- But my system is really complex! 2m
- Managing complexity with aggregation 2m
- Managing complexity with bucketing 3m
- Achieveable SLOs 2m
- Aspirational SLOs 1m
- Continuous improvement 2m