Baking Clouds Ltd

Baking Clouds provide tailored IT consultancy services to small and medium-sized companies; we cover all aspects of IT without any hidden costs.

How Much Do You Really Know About Site Reliability Engineering (SRE)?

Maybe you are looking for a job and looking for “SRE” Site Reliability Engineering in the ads and wondering, what is it about? Either your company has decided to benefit from this role, or it simply makes you curious. In this blog, we will overview what the SRE role is about and tell you about some of the most used tools and basic concepts.

So … What’s Site Reliability Engineering?

Site reliability engineers establish a bridge between development and operations by bringing software engineering mentality to concerns of system administration. They spend time between operations/call tasks and build methods and software to depth overview dependability and performance.

Site Reliability Engineering uses software engineering and automation solutions to ensure that continually provided applications work efficiently and reliably.

Engineering is the crucial concept, which incorporates a data-driven approach to operations, an automated culture to increase efficiency and decrease risk, and a hypothesis-driven methodology in incident, performance, quality management, and capacity tasks. 

A focus on improving things is another important principle. Site reliability engineers (SREs) are responsible for automating and restoring failing systems and ensuring that failures do not occur again. A blameless postmortem determines the incident’s fundamental cause or causes and develops a balanced action plan to resolve them. 

The story of SRE

The SRE definition originated by Benjamin Treynor Sloss (Vice President, Engineering @Google), was a way of thinking and approaching software production and as a set of rules and practices.  The idea is to see all processes as software issues that need to be fixed by engineers.

In the video below, you can see the entire SRE story: 

Goals of SRE

The following practices are used by site reliability engineering to implement DevOps goals:

The first is to establish a function for a site reliability engineer. This new role takes the place of operators and focuses on sharing production responsibilities with developers. 

Number two, conducting blameless autopsies. These are discussions held after incidents to determine what went wrong and avoid it from happening again.

The third is to create and enforce an error budget. Budgeting your money leads to reduced spending and ensures that all of your expenses are paid on time.

An error budget works similarly, encouraging modest adjustments while maintaining the proper mix of growth and stability. 

Number four is to identify and endeavour to lessen toil. Medial operational duties are defined as toil in site reliability engineering, and it provides tools for measuring and minimizing it

Number five is to track service level measurements and goals: SLIs, SLOs, and SLAs are examples. 

Site Reliability Engineering Tools

Here is the list of some tools used in site engineering reliability.

DeploymentPerformance and MonitoringCommunicationTrackingAutomated response system
AnsibleKibanaSlackTrelloPagerDuty
TerraformDatadogTelegramAsanaVictorOps
SaltStackNew RelicMicrosoft TeamsJiraOpsgenie

Should You Consider Becoming Site Reliability Engineer?

Whatever your experience in software or system engineering, you may become an SRE as long as you have good foundations and a solid drive to improve and automate.  Increasing your expertise in both sectors will provide you with a comparative advantage and allow you to be more flexible in the long term.

So if you find yourself reflecting in the way an SRE works, or you are part of a company that sees the benefits of this role, it is time to get going and delve into the concepts, principles and practices of SRE!

Additional Resources

Below you can find links to articles and videos related to the Site Reliability Engineering (SRE), and also some links to get formal education. We welcome suggestions for additions. Enjoy!

Differences between SRE & Devops

Courses

Pluralsight Site Reliability Engineering (SRE): The Big Picture

Coursera Site Reliability Engineering: Measuring and Managing Reliability

IBM SRE Certification IBM Cloud Professional Site Reliability Engineer (SRE)

What Is SRE? by Kurt Andersen, Craig Sebenik

Google Materials

Preparing for Google Cloud Certification: Cloud DevOps Engineer Professional Certificate

Documentation related to SRE daily tasks

Here’s a list of links with documentation we curated to showcase what is involved on SRE’s daily work

Service-level metrics
Available . . . or not? That is the question
SLOs, SLIs, SLAs, oh my
Building good SLOs
Consequences of SLO violations
An example escalation policy
Applying the escalation policy
Defining SLOs for services with dependencies
Tune up your SLI metrics
Learning—and teaching—the art of service-level objectives
Using deemed SLIs to measure customer reliability

Releases
Reliable releases and rollbacks
How release canaries can save your baconS

RE support
Why should your app get SRE support?
How SREs find the landmines in a service
Making the most of an SRE service takeover

Dark launches
What is a dark launch, and what does it do for me?
The practicalities of dark launching

Postmortems
Fearless shared postmortems
Getting the most out of shared postmortems

Error Budgets
Good housekeeping for error budgets
Understanding error budget overspend

Production Incidents
Shrinking the impact of production incidents using SRE principles
Shrinking the time to mitigate production incidents

Job Market

Lastly but not less important on why to choose becoming an SRE if you are new into IT industry is about the salary and career.

This figures are from Indeed website as example. For details explore Indeed website here for salary and here for open roles

We don’t take any responsibility on that information.

CountrySalary as seen in Indeed in local currency
USA$131,300 per year
UK£69,413per year
Australia$140,938 per year
New Zealand$90,000 per year (GlassDoor)

Wrap-up

We shared information on what means being site reliability engineer SRE, some of the daily tasks and tools used in the job and how being SRE will feel for your finance in comparison where you are at today.

Make this article better by sharing your comments and adding your input whether you know about SRE or you are SRE.

We want to hear what you think about this article, how can we improve it. Your feedback is important to us.

Want to hear more from you. Click here

How Much Do You Really Know About Site Reliability Engineering (SRE)?
Scroll to top