This time, I will report on the session of AWS Summit Online India 2022 , which is being held for two days from May 25 to 26, 2022 . Use it to understand the session summary and check the sessions you are interested in. Also, the session archive will be published, so please check there for details.
This article is a report article on Better Reliability With SLOs
Summary:Service Level Objectives (SLOs) are a measurement of the reliability and general experience your end-users and customers can expect. In this talk, you’ll learn how to define SLOs by choosing the correct service level indicators (SLIs) and defining appropriate agreements with stakeholders. We will explain the key concept of error budgets, which give you a solid, actionable metric for balancing innovation and velocity with reliability and safety. You will also learn how to have meaningful conversations around realistic availability, which will enable you to define high quality SLOs for your own organization.
- SLIs, SLOs, and SLAs
- Defining quality targets
- Error budgets
- Practical exampleSecurity vs innovation
SLIs, SLOs, and SLAs
- A Service Level Indicator is a quantitative measurement that expresses an aspect of the service.
- A Service Level Objective is a target value for a service, as measured via an SLI, over a specified time window.
- A Service Level Agreement is a contract that defines the results (and consequences) of meeting (or missing) one or more SLOs.
Identifying good SLIs
- Availability – Could the server respond to the request?
- Latency – How long did it take for the server to respond to the request?
- Throughput – How many requests can be handled?
- Availability – Can the data be accessed on demand?
- Latency – How long does it take to read or write data?
- Durability – Is the data still there when it is needed?
- Correctness – Was the right data returned?
- Freshness – How long does it take for new data or processed results to appear?
SLIs are applied values
- Indicators must represent user experience
- The number of requests to an endpoint that complete successfully.
- The number of requests to an endpoint that complete within 500ms.
SLOs are applied SLIs
- Objectives have both a target and a time window
- Requests are 99.95% successful in the last 24 hours.
- 90% of requests complete under 500 ms over the past 30 days.
SLAs are applied SLOs
- Agreements address expectations and impacts.
- The customer expects a given service to have a 0.05% maximum error rate daily, or they’ll receive a rebate.
- The customer expects only 10% of monthly requests to take longer than 500ms to complete, or they’ll be reimbursed for the compute overage.
What Is Error Budget:
The difference between the actual measurement and the objective is the error budget. 1.0 - $SLO If SLO is 99%, your Error Budget is 1%
Using the budget
- Spend the budget - If the SLO is currently being met, you have room to move. Add new features! Deploy a new version!
- Build the budget - If the budget is zero (or negative), you should concentrate on that. Freeze new features. Improve the observability story. Prioritise dealing with technical debt.
SLIs and SLOs – Help us maintain a focus on the users – Can change as systems change – Help us balance between different groups and stakeholders Error Budget is 1 minus SLO – Help balance development versus resiliency – Should be spent In Practice – Build out slow with room to grow – Use good tooling to set up SLOs and use alerts to find out when error budget gets too low
At the end
I was able to understand the whole picture of SLAs , SLIs and SLOs Error Budget .and also there is a practice session in data dog and so many informative tips .
Be sure to sign up for AWS Summit Online and check out the session archives! Slide materials were also distributed, so check it out.