Top 50 Awesome List

dastergon/awesome-sre

Miscellaneous  1 month ago  8.5k
A curated list of Site Reliability and Production Engineering resources.
View byDAY/WEEK/README
View on Github

May 15th

Apr 24th

Feb 22nd

Service Level Agreement

  • Calculating composite SLA
  • Feb 3rd

    SRE Tools

  • SRE cheat sheetstars90 - A cheat sheet for Site Reliability Engineering principles and numbers
  • Jan 7th

    Dec 21st, 2021

    Blogs

  • Logit.io Blog - Resources on log management, SRE and devOps.
  • Dec 14th, 2021

    Sep 22nd, 2021

    Jun 26th, 2021

    Blogs

  • incident.io Blog - Guides, advice and resources on incident management and response.
  • May 19th, 2021

    Blogs

  • Rootly Blog - Incident management best practices and guides.
  • May 17th, 2021

    Misc Articles

  • Site Reliability Engineering for Native Mobile Apps - Abhijith Krishnappa - Case study: Halodoc adaptation of SRE principles for Native Mobile Apps
  • Newsletters

  • ChaosEngineering.news - Chaos Engineering newsletter. All things Chaos Engineering, directly to your inbox!
  • Apr 9th, 2021

    Feb 22nd, 2021

    Jan 22nd, 2021

    Jan 14th, 2021

    Reliability

  • Generic mitigations
  • Dec 22nd, 2020

    Blogs

  • FireHydrant Blog - Posts about complex systems, incident response, and SRE best practices.
  • Dec 10th, 2020

    Nov 14th, 2020

    Conferences & Meetups

  • Site Reliability Engineering India - SRE Meetup India
  • Oct 2nd, 2020

    Sep 30th, 2020

    Jul 4th, 2020

    Newsletters

  • KubeWeekly - The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas
  • May 29th, 2020

    Newsletters

  • DevOpsLinks - A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.
  • SRE Weekly - Weekly Site Reliability Newsletter.
  • O’Reilly Systems Engineering and Operations Newsletter - Weekly systems engineering and operations news and insights from industry insiders.
  • Apr 25th, 2020

    SRE Tools

  • Awesome SRE Toolsstars544 - A curated list of Site Reliability and Production Engineering tools
  • Jan 23rd, 2020

    Blogs

  • Resilience Roundup - Weekly analysis of Resilience Engineering and Human Factors research designed for software systems
  • Dec 12th, 2019

    Dec 2nd, 2019

    Service Level Agreement

  • The Art of SLOs Workshop Materials
  • Dec 1st, 2019

    Misc Articles

  • SRECon EMEA 2019 Recap
  • Nov 26th, 2019

    Conferences & Meetups

  • Site Reliability Engineering Paris, France - SRE Meetup in the city of light.
  • Nov 6th, 2019

    Oct 31st, 2019

    Oct 18th, 2019

    Conferences & Meetups

  • ADDO - All Day DevOps - A 24 hour conference that is completely online and free.
  • Oct 10th, 2019

    Monitoring & Observability & Alerting

  • Observations on Observability
  • Oct 8th, 2019

    Jul 22nd, 2019

    Jul 20th, 2019

    Jul 5th, 2019

    Jul 3rd, 2019

    Twitter

  • Google SRE Twitter Account - Google's SRE Twitter Account.
  • SREWorkbook - The Official Twitter Account of Site Reliability Workbook.
  • On-Call

  • Managing Incidents at Monzo
  • Jun 23rd, 2019

    Jun 19th, 2019

    Monitoring & Observability & Alerting

  • Alerting on SLOs like Pros
  • Jun 16th, 2019

    Blogs

  • Brendan Gregg's Blog - Highly Technical Blog Posts About Systems Internals, Performance and SRE.
  • May 18th, 2019

    May 8th, 2019

    Apr 17th, 2019

    Mar 13th, 2019

    Misc Articles

  • SRE Adoption Report
  • Jan 8th, 2019

    Blogs

  • Cindy Sridharan - Blog posts about distributed systems and their management.
  • Dec 1st, 2018

    Twitter

  • Twitter SRE Weekly - The Official Twitter Account of SRE Weekly Newsletter.
  • Nov 16th, 2018

    Aug 14th, 2018

    Jul 20th, 2018

    Service Level Agreement

  • SRE fundamentals: SLIs, SLAs and SLOs
  • Jul 2nd, 2018

    Culture

  • From Sys Admin to Netflix SRE - video and slides
  • Jun 9th, 2018

    Service Level Agreement

  • Error Budget Calculator
  • May 18th, 2018

    Monitoring & Observability & Alerting

  • Debugging Latency in Go 1.11
  • Want to Debug Latency?
  • May 6th, 2018

    Apr 29th, 2018

    Culture

  • Tech Leadership in SRE
  • Conferences & Meetups

  • Site Reliability Engineering Munich, Germany - SRE Meetup in the greater area of Oktoberfest city.
  • Apr 18th, 2018

    On-Call

  • Moving Past Shallow Incident Data
  • Reliability

  • Canary Analysis Service
  • Culture

  • The Makeup of Successful Geographically-Distributed SRE Teams - Part1 & Part2
  • Apr 16th, 2018

    Monitoring & Observability & Alerting

  • GitOps Part 3 - Observability
  • Mar 10th, 2018

    Monitoring & Observability & Alerting

  • The Many Ways Your Monitoring Is Lying to You
  • Jan 8th, 2018

    Dec 29th, 2017

    Nov 4th, 2017

    Blogs

  • rachelbythebay - Techincal Blog Posts.
  • Oct 24th, 2017

    Service Level Agreement

  • Building good SLOs - CRE life lessons
  • Oct 17th, 2017

    Service Level Agreement

  • A Practical Guide to SLAs
  • Aug 16th, 2017

    Aug 13th, 2017

    Jul 31st, 2017

    Culture

  • The SRE model
  • Jul 29th, 2017

    Post-Mortem

  • Embracing Feedback
  • Jul 26th, 2017

    Jul 20th, 2017

    Programming

  • Operability in Go
  • May 25th, 2017

    Service Level Agreement

  • The Calculus of Service Availability
  • May 23rd, 2017

    Service Level Agreement

  • (Un)Reliability Budgets - Finding Balance between Innovation and Reliability
  • Reliability

  • The Production Environment at Google - Part 1 & Part 2
  • May 6th, 2017

    May 3rd, 2017

    Jan 20th, 2017