Course Overview
TOPToday s organizations deal with a higher volume of change in a more complex tech environment leading to a higher risk of outages and incidents. IT teams must improve service reliability and system resiliency. With automation and observability becoming key factors for more efficient and rapid deployments, the Sight Reliability Engineering (SRE) profile has become one of the fastest-growing enterprise roles and set of operational practices for managing services at scale.
The DevOps Institute SRE Practitioner? course provides a practical view of how to successfully implement a flourishing SRE culture in your organization. This 3-day course is a practical progression for DOI SRE Foundation? certificate holders.
Scheduled Classes
TOPOutline
TOPModule 1: SRE Anti-Patterns
- Break the ice with a recap of DevOps Institute s SRE Blueprint
- Discuss how SRE works in a distributed ecosystem
- Discuss some of the SRE Barriers
- A few SRE Anti-Patterns (discuss the right patterns too)
- Discuss the Case Story of how Monzo bank learned from causes leading to SEV1 issue
- Case Story: Monzo Bank
- Discussion / Exercise: Good versus Bad Postmortem, Describe a Major Incident, Anti-Patterns of SRE
Module 2: SLO is a Proxy for Customer Happiness
- What has changed with SLO?
- Identifying System boundaries for setting SLIs is critical
- How do you use Error Budgets beyond the velocity versus stability debate?
- Case Story: Kudos Engineering, Home Depot
- Discussion / Exercise: Establishing SLOs in Distributed Ecosystems
Module 3: Building Secure and Reliable Systems
- Building Secure and Reliable systems
- Non-Abstract Large Scale Design
- Designing for the changing Architecture and distributed ecosystem
- Fault tolerant Design
- Designing for Security
- Designing for Resiliency
- Case Story: Chrome Security Team
- Discussion / Exercise: Non-Abstract Large Scale Design Capacity
Module 4: Full Stack Observability
- Modern Apps are Complex & Unpredictable
- Slow is the New Down
- Pillars of Observability
- Using Open Telemetry
- Case Story: Planet Labs
- Discussion / Exercise: How do you bake Observability in your Code
Module 5: Platform Engineering and AIOps
- Taking a Platform Centric View
- How do you use AIOps to improve Resiliency
- How can DataOps help you in the journey
- A simple recipe to implement AIOps
- Indicative measurement of AIOps
- Case Story: FedEx, 3M
- Discussion / Exercise: Instrumenting AIOps using Prometheus
Module 6: SRE and Incident Response Management
- SRE Key Responsibilities towards incident response
- DevOps & SRE and ITSM (new vs. old ways)
- OODA and SRE Incident Response
- SRE and CLR (closed loop remediation)
- Swarming Food for Thought
- AI/ML for better Incident Management
- Case Story: HCL AIOps Journey
- Discussion / Exercise: Teams to discuss about Swarming and Tier Layered Incident Response framework
Module 7: Chaos Engineering
- Navigating Complexity
- Chaos Engineering Defined
- Quick Facts
- Chaos Monkey Origin Story
- Who is adopting Chaos Engineering
- Myths of Chaos
- Chaos Engineering Experiments
- GameDay Exercises
- Security Chaos Engineering
- Chaos Engineering Resources
- Discussion / Exercise: Instrumenting Gremlin, Discuss how to conduct a GameDay exercise
Module 8: SRE is the Purest Form of DevOps
- Key Principles of SRE
- SREs help increase Reliability across the spectrum
- Metrics for Success
- SRE Execution models
- Culture and Behavioral Skills are key
- Transformation after implementing SRE practices
- Case Story: Airbnb
- Discussion / Exercise: Discuss NALSD learnings from Module, Transformation after implementing SRE practices
Prerequisites
TOPIt is highly recommended that learners attend the SRE Foundation course and earn the SRE Foundation certification prior to attending the SRE Practitioner course and exam. An understanding and knowledge of common SRE terminology, concepts, principles and related work experience are recommended.
Who Should Attend
TOP- IT leaders & managers
- Organizational change leaders and agents
- SRE engineeers
- System Integrators
- Business Stakeholders
- DevOps Practitioners
- System Integrators
- Scrum Masters/Product Owners
- Software Engineers