This week I talk about the impact of SaaS-first technology strategies on the work of an SRE. I pose questions about observability, ownership, on-call, and how much control we have over reliability.You can find the Bleeding Tech blog on Medium: https://medium.com/@stownshendYou can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
Do 02.05.2024
11 : 08 min
Slight Reliability Episode 84 - Clinical Troubleshooting with Dan Slimmon
This week I chat with Dan Slimmon about applying the approach doctors use to treat patient symptoms during incident response.You can find Dan's blog at https://blog.danslimmon.com/ or connect with him on LinkedIn here: https://www.linkedin.com/in/danslimmon/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sreThis episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up f
Sa 30.03.2024
27 : 40 min
Slight Reliability Episode 83 - An Unfulfilled Promise with Itiel Shwartz
This week I hear about all things Kubernetes from Komodor CTO and co-founder Itiel Shwartz. We chat about the promise that was made when Kubernetes first entered the industry, the challenge of getting developers engaged and capable of working in Kubernetes, my hate/hate relationship with Helm but its important contribution to the Kubernetes project, Kubernetes observability, and so much more.You can find the Kubernetes for Humans podcast here:https://komodor.com/blog/the-kubernetes-for-humans-podcast/Or find out more about Komodor here:https://komodor.com/Or find Itiel on LinkedIn: https://www.linkedin.com/in/itiel-shwartz-18542853/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sreThis episode was
Di 05.03.2024
30 : 32 min
Slight Reliability Episode 82 - CI/CD with Amin Astaneh
This week I sit down and have a discussion with Amin Astaneh (from Certo Modo) about CI/CD. We cover the power of the standard change as a way to navigate ITIL while still implementing DevOps practices, what to monitor to make your CI/CD observable, single piece flow, testing in production, and so much more.You can find Amin on his company website https://certomodo.io, LinkedIn: https://www.linkedin.com/in/aminastaneh/ and Twitter: https://twitter.com/aastanehYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sreThis episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data
Di 13.02.2024
25 : 47 min
Slight Reliability Episode 81 - Incident Management in Non-Prod Environments
"Environment issues are just incidents that happened to occur in a non-production environment"... so why do we treat them so differently?In this first episode of the 2024 season I reflect on how we handle incidents in non-prod environments.(Note: Had a few issues with noise suppression in OBS Studio cutting off the start of some words, will sort it for the next episode)You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
Di 06.02.2024
10 : 09 min
Slight Reliability Episode 80 - What's Been Bugging Niall Murphy
This week I speak with co-author of the original SRE book + the SRE workbook, and renowned speaker Niall Murphy.We chat about the state of SRE in the current macro-economic climate and how we're not yet doing a very good job at articulating the value of SRE to leaders, the relationship that velocity and reliability have, the value of new features versus reliability improvements, and *much* more.You can find Niall at:LinkedIn: https://www.linkedin.com/in/niallm/X: https://twitter.com/niallmWebsite: https://relyabilit.ie/(and his company Stanza: https://www.stanza.systems/)You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/X: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Paige Cruz (from Chronosphere) is back. This week we discuss sampling. What is sampling? Why do it? What kinds of sampling are there?You can check out Chronosphere's cloud native observability platform here: https://chronosphere.io/You can find Paige on:LinkedIn: https://www.linkedin.com/in/paigerduty/X: https://twitter.com/paigerdutyYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/X: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Di 21.11.2023
45 : 27 min
Slight Reliability Episode 79 - Incident Story Time with Valeska Victoria
This week Valeska Victoria returns to share some of her experiences working as an SRE at eBay.We look at the cascading effect of production issues in complex integrated environments (how there's often no single root cause), developer literacy of how infrastructure works, the importance of ownership and accountability of reliability, and much more.You can find Valeska on: LinkedIn: https://www.linkedin.com/in/valeska-victoria/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/X: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Mo 20.11.2023
37 : 51 min
Slight Reliability Episode 78 - Developer Experience with Ankit Jain
This week I chat with Ankit Jain from aviator.co about developer experience.We define developer experience and developer productivity, and how this applies to SRE. We discuss the growing expectation on developers and how this leads to frustration and burnout. We also explore how to measure developer experience and how to start working to make improvements.You can check out Aviator's developer experience platform here: https://www.aviator.co/You can find Ankit on:LinkedIn: https://www.linkedin.com/in/ankitjaindce/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/X: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Fr 17.11.2023
32 : 21 min
December 2023 Update
A brief mid-week update on my changing circumstances and the future of the podcast.
Do 16.11.2023
5 : 07 min
Slight Reliability Episode 77 - SRE to DevRel with Liz Fong-Jones
This week I had the privilege of interviewing Liz Fong-Jones from honeycomb.io about DevRel, Developer Advocacy, and how that applies to SRE.We discuss the difference between Developer Relations (DevRel) and Developer Advocacy, how Liz got into advocacy, how DevRel helps companies and the community, and some tips on how to get traction with SRE practices in your organisation.You can check out Honeycomb's observability platform here: https://www.honeycomb.io/You can find Liz on:LinkedIn: https://www.linkedin.com/in/efong/Website: https://www.lizthegrey.com/ (all her social/links are here)You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/X: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Mi 15.11.2023
31 : 53 min
Slight Reliability Episode 75 - Enterprise SRE with Steve McGhee
This week I had the honour of chatting with Steve McGhee (former Google SRE, current Google Reliability Advocate, and co-author of Enterprise Roadmap to SRE).We discuss the evolution of SRE from where it began at Google and how it is being adopted by enterprises around the world now (and why this is happening). We talk about getting leadership support and how we get reliability taken seriously, the lies we tell ourselves to justify incidents and issues, leveraging transformation projects to bring SRE to life, how SLOs can act as the fulcrum between dev and ops, the fallacy of the pyramid model of reliability... and so much more.You can find Steve at on:LinkedIn: https://www.linkedin.com/in/stevemcghee/X: https://twitter.com/stevemcgheeYou can find Steve's book "Enterprise Roadmap to SRE" here: https://sre.google/resources/practices-and-processes/enterprise-roadmap-to-sre/Steve also mentions the book "A Seat at the Table": https://itrevolution.com/product/a-seat-at-the-table/You can find the official Slight Re
Di 14.11.2023
39 : 00 min
Slight Reliability Episode 74 - The Hidden Side of Vendor Lock-In
This week on Slight Reliability Stephen discusses observability vendor lock-in. What is it? What does OpenTelemetry do to help? What areas are yet to be solved?You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
Di 31.10.2023
8 : 55 min
Slight Reliability Episode 73 - Enterprise SLOs with Brian Singer
This week we sit down and talk about SLOs with CPO and co-founder of Nobl9 Brian Singer.We talk about the importance of reviewing operational effectiveness, getting buy in from leadership, using SLOs to reduce noise, how to implement SLOs within different cultures and structures, the parallels between security and reliability... and much more.You can check out Nobl9's reliability and SLO platform here: https://www.nobl9.com/You can find Brian on LinkedIn: https://www.linkedin.com/in/briantsinger/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Di 24.10.2023
32 : 18 min
Slight Reliability Episode 72 - Rapid Incident Response with Valeska Victoria
This week Stephen chats with Valeska Victoria about her time working as an SRE at eBay.Valeska shares her data driven approach to SRE, having a voice as a less experienced engineer, handling incidents under high pressure, leveraging large language models to rapidly find the information you need during an incident, and much more.You can check out PromptOps here: https://www.promptops.com/You can find Valeska on LinkedIn: https://www.linkedin.com/in/valeska-victoria/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Di 17.10.2023
42 : 19 min
Slight Reliability Episode 71 - Implementing SRE with Dr. Vlad Ukis
This week Stephen chats with Dr. Vlad Ukis about his journey discovering, and then implementing SRE practices at Siemens Healthineers (which led to him writing a book). They discuss how the evolution of infrastructure necessitates a shift in how we operate, the power of selling SRE practices, the SRE infrastructure used to build SLOs and reliability capabilities, how he implemented SLOs, and much more.You can find Vlad's book "Establishing SRE Foundations" here: https://www.amazon.com/Establishing-Foundations-Step-Step-Organizations/dp/0137424604 You can find Vlad on LinkedIn: https://www.linkedin.com/in/dr-vladyslav-ukis-5172ba32/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Di 10.10.2023
29 : 25 min
Slight Reliability Episode 70 - Meta SRE with Amin Astaneh
Amin Astaneh (from Certo Modo) is back to discuss his experience working as a production engineer (SRE equivalent) at Meta.Stephen and Amin discuss what it's like interviewing for big tech, "you build it, you own it", different SRE engagement models, SRE at different sizes of organisation, socialising your SRE success as a way to get traction, and so much more.You can find Amin on his company website https://certomodo.io, LinkedIn: https://www.linkedin.com/in/aminastaneh/ and Twitter: https://twitter.com/aastanehThe books Amin mentions are...The Practice of Cloud System Administration: https://www.oreilly.com/library/view/practice-of-cloud/9780133478549/Leading Change:https://www.kotterinc.com/bookshelf/leading-change/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Di 03.10.2023
42 : 24 min
Slight Reliability Episode 69 - Developer to SRE with Praveen Kasam
This week Stephen talks to Praveen Kasam from Diconium Digital Solutions about how he led SRE transformations.Praveen shares his experience transitioning from development to SRE and how leveraging automation and bringing application knowledge to the ops team provided quick wins. He also covers how he later applied SRE concepts to uplift the wider organisation. If you are out there looking for advice on how to implement SRE in your organisation, this is the episode for you.You can find Praveen at:LinkedIn: https://www.linkedin.com/in/kasampraveen/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/X: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
Di 26.09.2023
30 : 10 min
Slight Reliability Episode 68 - Dashboards and Modern Observability with Eric Schabell
This week Stephen asks Eric Schabell (Director of Technical Marketing & Evangelism @ Chronosphere) about how dashboards fit into modern observability.They discuss how untamed observability can lead to unexpectedly high cloud bills, the similarities between dashboards and documentation, the "know > triage > understand" workflow, and much more.You can find Eric at:LinkedIn: https://www.linkedin.com/in/ericschabell/X: https://twitter.com/ericschabell And you can find Chronosphere at: https://www.linkedin.com/company/chronosphereio/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/X: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
Di 19.09.2023
32 : 31 min
Slight Reliability Episode 67 - Single Pane of Glass with Jamie Allen and Adam Kinniburgh
This week Stephen chats with Jamie Allen (Cheif Technologist AWS & SRE @ EPAM Systems) and Adam Kinniburgh (VP Innovation @ SquaredUp) about the concept of a single pane of glass (SPOG) for SRE.Is it performance art or something actionable? Can alerting replace the need for dashboards? And are metrics drowning in the wake of distributed tracing?You can find Jamie at:LinkedIn: https://www.linkedin.com/in/jlallen/And the Single Pain of Glass article he wrote here: https://medium.com/site-reliability-engineering-leadership/the-single-pain-of-glass-6e42930e966You can find EPAM at https://www.epam.com/And you can find the Google Dapper paper here: https://static.googleusercontent.com/media/research.google.com/en//archive/papers/dapper-2010-1.pdfYou can find Adam at:LinkedIn: https://www.linkedin.com/in/adamkinniburgh/X: https://twitter.com/adamkinniburghYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephe
Di 12.09.2023
34 : 36 min
Slight Reliability Episode 66 - Building Digital Assistants for SRE with Kyle Forster
This week Stephen brings back Kyle Forster from RunWhen to talk about the purple elephant in the room… “AI”. What makes it GenAI, LLM, Advanced Statistics, or ML? Kyle shares his experience surrounding building AI powered search engines for SRE troubleshooting commands and how to incorporate a (paid) open source community of experts rather than trust AI by itself. They discuss what search looks like under the hood, why GenAI powered chatbots will or won't take over the SaaS industry, how Digital Assistants can be utilised by SREs to increase productivity (hint: giving them to app developers!), how to make informed decisions when purchasing AI products, and *much* more. You can find Kyle at:LinkedIn: https://www.linkedin.com/in/kyforster/recent-activity/all/And you can find out more about RunWhen at: Website: https://www.runwhen.com/Product videos: https://www.youtube.com/@whatdoirunwhen RunWhen Local: https://github.com/runwhen-contrib/runwhen-local (RunWhen Local is an open source troubleshooting che
Di 05.09.2023
29 : 51 min
Slight Reliability Episode 65 - The Truth About Incidents with Courtney Nash
This week Stephen chats with the internet incident librarian herself, Courtney Nash. They explore what Courtney has learned through meta-analysis of the over ten thousands incidents in the Verica Open Incident Database (VOID). They cover why MTTR needs to go in the garbage, joint cognitive systems, the value of looking at near misses and *much* more.You can check out the VOID here: https://www.thevoid.community/The two papers mentioned are:Ironies of Automation by Lisanne Bainbridge: https://queue.acm.org/detail.cfm?id=3380779Managing the Hidden Costs of Coordination by Laura Maguire: https://ckrybus.com/static/papers/Bainbridge_1983_Automatica.pdfYou can find Courtney at:LinkedIn: https://www.linkedin.com/in/nashcourtney/Twitter: https://twitter.com/courtneynashYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/S
Di 29.08.2023
41 : 04 min
Slight Reliability Episode 64 - Observability During Development with Martin Thwaites
This week Stephen chats with Martin Thwaites from Honeycomb about how developers can leverage observability to understand what they're building better, solve bugs quicker, and have more time for coding. They also discuss OpenTelemetry (the protocol and semantic conventions), manual versus automatic instrumentation, and how keeping every span of trace data is irresponsible.You can find Martin at:LinkedIn: https://www.linkedin.com/in/martin-thwaites-ab445120/X: https://twitter.com/MartinDotNetAnd Honeycomb at https://www.honeycomb.io/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/X: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
Di 22.08.2023
36 : 18 min
Slight Reliability Episode 63 - The Power of Summary
Observability is a necessary adaptation to make sense of software systems in the Digital Age, but how can we unlock its power for non-engineer stakeholders (such as executives, product owners, etc)? Perhaps we need a layer of abstraction sitting on top of our detailed observability to get the most out of it.You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
Di 15.08.2023
9 : 20 min
Slight Reliability Episode 62 - On-Call with Matt Brown
This week Stephen chats with former-Google SRE Matt Brown about being on-call. They cover how to up-lift junior engineers so they can be on-call, what a fair on-call schedule looks like, run-books, and much more.As you heard, Matt believes flexibility is key to a healthy on-call rotation. Matt is exploring ideas for improvements to existing tooling and products in this space and would love to hear from as many listeners as possible with feedback on what they find useful or frustrating with the existing tools they use to support on-call in their teams. You can reach him at [email protected] or schedule a chat via https://zcal.co/mattb/oncall, please don't be shy!You can also find Matt at:Website: https://www.mattb.nz/LinkedIn: https://www.linkedin.com/in/mattbrown/Mastodon: https://mastodon.nz/@mattbTwitter: https://twitter.com/xleemYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend
Di 01.08.2023
36 : 57 min
Slight Reliability Episode 61 - SRE VS DevOps VS Platform Eng... (Yawn)
The internet is full of people who want to tell you about SRE, DevOps, and Platform Engineering and how different and similar they are... and will give you the impression that these things compete with each other. But do they? And is it a helpful question to ask in the first place?You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreYouTube: https://www.youtube.com/c/SlightReliabilityInstagram: https://www.instagram.com/slight_reliability/TikTok: https://www.tiktok.com/@the_kiwi_sre
Di 25.07.2023
6 : 07 min
Slight Reliability Episode 60 - From Zero to SRE with Amin Astaneh
In this episode Amin Astaneh from Certo Modo discusses his experience undertaking an SRE transformation over several years.Stephen and Amin cover a lot of ground including making ops work visible, measuring toil, the power of calculating the $ value of work, getting developers on-call, the embedded model for SRE, SLOs, culture change, and a whole lot more.You can find Amin on his company website https://certomodo.io, LinkedIn: https://www.linkedin.com/in/aminastaneh/ and Twitter: https://twitter.com/aastanehThe books Amin mentions are...The Practice of Cloud System Administration: https://www.oreilly.com/library/view/practice-of-cloud/9780133478549/The Phoenix Project: https://www.oreilly.com/library/view/the-phoenix-project/9781457191350/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Di 11.07.2023
42 : 46 min
Slight Reliability Episode 59 - Bad API Observability with Sonja Chevre
In this episode Stephen Townshend and Sonja Chevre from Tyk discuss making APIs observable, and some anti-patterns to avoid. They cover GraphQL, OpenTelemetry and semantic conventions, correlation IDs, observability pipelines, and much more.You can find Sonja on LinkedIn: https://www.linkedin.com/in/sonjachevre/ and Twitter: https://twitter.com/SonjaChevreYou can listen to Sonja's KubeCon talk here: https://youtu.be/IkEUJjRBCboYou can find Tyk's open source gateway here: https://github.com/TykTechnologies/tykYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
In this episode Stephen Townshend and Harinder Seera explore how to monitor and manage the cost of cloud. They discuss FinOps as a cultural practice, anti-patterns for implementing in the cloud, keeping cost down through resources, pricing, and architecture... and much more.You can find Harinder on LinkedIn: https://www.linkedin.com/in/harinderseera/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Di 27.06.2023
36 : 54 min
Slight Reliability Episode 57 - A Tale of Three Conferences
In this episode Stephen shares his experiences traveling overseas to the UK and Singapore AWS Summit, SREcon APAC, and the internal SquaredUp conference "SqUpCon".You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Slight Reliability artwork on Instagram:https://www.instagram.com/slight_reliability/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Di 20.06.2023
16 : 10 min
June 9th 2023 Update
A quick update on Stephen's whereabouts and when the next episode will be released.
Fr 09.06.2023
1 : 41 min
Slight Reliability Episode 56 - Dashbored
In this episode Stephen discusses the role of dashboards within the context of the Digital Era. What are they *not* appropriate for? What can they help with? What kinds of things are suitable to present?If you want to get involved in the SquaredUp dashboard competition head along to: https://squaredup.com/blog/dashboard-competition/ (everyone who submits an entry gets a t-shirt, you can also win Star Wars Lego, get video interviewed by me, and have the story of your dashboard presented both as a blog and on our Dashboard Gallery).You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Slight Reliability artwork on Instagram:https://www.instagram.com/slight_reliability/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreInstagram: https://www.instagram.com/slight_reliability/
Di 23.05.2023
14 : 06 min
Slight Reliability Episode 55 - Reflections on KubeCon with Bruce Cullen
This week Bruce Cullen is back to share his experiences from KubeCon + CloudNativeCon 2023 Europe. We chat about OpenTelemetry, green engineering, securing your CI/CD pipeline and much more.Bruce is the Director of Engineering at SquaredUp. You can find him on LinkedIn: https://www.linkedin.com/in/bruce-cullen/You can find the official Slight Reliability podcast website at: https://slightreliability.com/If you like Slight Reliability's mspaint style artwork you can find more of it on Instagram: https://www.instagram.com/slight_reliability/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 16.05.2023
40 : 21 min
Slight Reliability Episode 54 - Trends in Incident Management with Andy Thurai
In this episode Stephen Townshend chats to Andy Thurai (VP and Principal Analyst at Constellation Research) about Andy's latest report titled "Trends in Incident Management 2023". They chat about "mean time to innocence", status pages, they debate whether AI or ML has real value for incident management, and ponder why anyone would willingly decide to become an incident commander?You can find Andy's report here: https://www.constellationr.com/research/2023-trends-incident-managementYou can find Andy on LinkedIn here: https://www.linkedin.com/in/andythurai/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 09.05.2023
32 : 26 min
Slight Reliability Episode 53 - DORA Metrics with Tim Wheeler
In this episode Stephen Townshend chats to Tim Wheeler (Director of Engineering Services at SquaredUp) about his work implementing and continually monitoring DORA metrics. They chat about customising each metric to your own unique context, avoiding the weaponisation metrics, the "tools will solve this for me" trap, and much more.The books mentioned during this episode were: Accelerate, The DevOps Handbook, The Phoenix Project, The Unicorn Project, Lean Enterprise, and Sooner, Safer, Happier. Tim also mentioned the work of Bryan Finster (https://twitter.com/BryanFinster).You can find Tim on LinkedIn: https://www.linkedin.com/in/timjameswheeler/You can find out more about SquaredUp at https://squaredup.com/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 02.05.2023
28 : 03 min
Slight Reliability Episode 52 - Double, Double, Toil and Trouble!
In this episode Stephen explores the SRE concept of "toil". What is it? How can we measure it? How do we reduce it?Also in this episode: Can we make non-technology systems observable? (like we do technology ones), and the ineffectiveness of change advisory boards (CAB). Also, Stephen's upcoming attendance at SREcon, AWS Summit, and SLOconf.Shout outs to Steve McGhee, Dom Finn, and Shea Stewart.You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 25.04.2023
9 : 12 min
Slight Reliability Episode 51 - The reliability.org Community with Anurag Gupta
In this episode Stephen Townshend and Anurag Gupta discuss the new reliability.org community for SREs or reliability engineers to share experiences, ask questions, and find community. They discuss the value of community and sharing your thoughts, collaboration between organisations, vicious versus virtuous cycles for reliability, and much more.You can join us in the community by visiting https://www.reliability.org/You can find Anurag:On LinkedIn: https://www.linkedin.com/in/awgupta/You can find out more about Shoreline by visiting https://www.shoreline.io/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 18.04.2023
30 : 02 min
Slight Reliability Episode 50 - The 50th Episode Special with Bruce Cullen
In this episode Bruce Cullen interviews Stephen Townshend about the past, present, and future of the Slight Reliability podcast. They discuss their shared backgrounds in software testing, the different career paths that testing has opened up, and much more!Bruce is the Director of Engineering at SquaredUp. You can find him on LinkedIn: https://www.linkedin.com/in/bruce-cullen/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 11.04.2023
39 : 10 min
Slight Reliability Episode 49 - Implementing Observability in the Real World with Ivan Merrill
In this episode Ivan Merrill from Fiberplane shares his experiences implementing observability within some of the large complex organisations he's worked for in the past.You can find Ivan on LinkedIn: https://www.linkedin.com/in/ivan-merrill-1a05223/You can find out more about Fiberplane here: https://fiberplane.com/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 04.04.2023
38 : 34 min
Slight Reliability Episode 48 - Blind Insight
In this episode I discuss the word "insight" within the context of observability. Is insight something tools can provide? Is it something you can reproduce? You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 21.03.2023
8 : 01 min
Slight Reliability Episode 47 - Cloud Dependency Reliability with Jeff Martens and Ryan Duffield
In this episode Stephen Townshend discusses our increased dependency on third party cloud services and what this means for reliability with Jeff Martens and Ryan Duffield from https://metrist.io/.You can find Jeff... On LinkedIn: https://www.linkedin.com/in/jmartens/On Twitter: https://twitter.com/JmartensYou can find Ryan...On StackOverflow: https://stackoverflow.com/users/2696/ryan-duffieldOn GitHub: https://github.com/rduffieldYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 14.03.2023
32 : 45 min
Slight Reliability Episode 46 - Raw Telemetry
In this episode I propose the use of scatterplots of raw data to better understand how our systems are behaviour and what our customers are experiencing. The ideas from this episode come from my time as a performance engineer and working with legends in that space Richard Leeke (https://www.linkedin.com/in/richard-leeke-450448/) and Neil Davies (https://www.linkedin.com/in/neildaviesnz/).For some basic examples of scatterplots and what they show you versus line charts check out an article I wrote back in 2017 called "Let's Talk About Averages": https://www.linkedin.com/pulse/lets-talk-averages-stephen-townshend/Another proponent of scatterplots is Stijn Schepers (https://www.linkedin.com/in/stijnschepers/). Here's an article he wrote about it in 2019: https://www.linkedin.com/pulse/performance-testing-act-like-detective-use-raw-data-stijn-schepers/ Neil Davies' article on tornado scatters "Chasing Tornadoes" can be found here: http://www.performance-workshop.org/wp/wp-content/uploads/2013/12/Chasing_Tornadoe
Di 07.03.2023
10 : 03 min
Slight Reliability Episode 45 - Telemetry Fluency with Paige Cruz
In this episode we discuss uplifting telemetry knowledge within engineering teams to enrich their work (and their lives) with Paige Cruz from Chronosphere. We cover why not to take a chainsaw to your observability in order to cut costs, the dark side of auto-instrumentation, story telling with live data, and much more.The book that Paige recommends at the end is "Effecting Monitoring and Alerting for Web Operations": https://www.oreilly.com/library/view/effective-monitoring-and/9781449333515/You can check out Chronosphere here: https://chronosphere.io/You can find Paige on LinkedIn: https://www.linkedin.com/in/paigerduty/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 28.02.2023
48 : 37 min
Slight Reliability Episode 44 - Cognitive Overload with Paige Cruz
In this episode we discuss cognitive overload in SRE with Paige Cruz from Chronosphere. We cover both what cognitive load is, what causes it, as well as some potential antidotes and preventative measures.You can check out Chronosphere here: https://chronosphere.io/You can find Paige on LinkedIn: https://www.linkedin.com/in/paigerduty/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
In this episode I discuss my "bigger picture" perspective of what observability needs to be, and why it's important we include business and customer into what we monitor in the Digital Era.The books I highlight in this episode are...Observability Engineering https://www.oreilly.com/library/view/observability-engineering/9781492076438/Sooner, Safer, Happier: https://soonersaferhappier.com/book/The Phoenix Project https://www.oreilly.com/library/view/the-phoenix-project/9781457191350/The Unicorn Project https://www.oreilly.com/library/view/the-unicorn-project/9781098124175/Accelerate: https://www.oreilly.com/library/view/accelerate/9781457191435/You can grab a copy of the 2022 State of DevOps report at: https://cloud.google.com/devops/state-of-devopsThe blog I mentioned was The Insight Industrial Complex: https://benn.substack.com/p/insight-industrial-complexYou can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/i
Di 14.02.2023
10 : 14 min
Slight Reliability Episode 42 - Reliability Insights with José Velez
In this episode we speak to José Velez from Rely about reliability at scale, a top down approach to SLOs, the potential and limitations of AI and ML in operations, the question of service ownership, utilising the business criticality of services in how we monitor the underlying infrastructure, and much more.You can check out Rely at https://www.rely.io/You can find José on LinkedIn: https://www.linkedin.com/in/josevelez-relyio/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 07.02.2023
36 : 34 min
Slight Reliability Episode 41 - Testing with Traces (with Ken Hamric)
In this episode we speak to Ken Hamric about distributed tracing, leveraging tracing for better testing, and observability driven development.The tool that Henrik Rexed integrated with Tracetest was Kuberhealthy (https://www.cncf.io/projects/kuberhealthy/) and you can watch a video of him discussing it in combination with Tracetest here: https://youtu.be/PKQQEeeMYxg?t=2492Ken also mentioned Charity Majors' writing about observability driven development: https://thenewstack.io/a-next-step-beyond-test-driven-developmentYou can check out Tracetest: - The official website: https://tracetest.io/- GitHub repo: https://github.com/kubeshop/tracetest- Discord channel: https://discord.com/channels/884464549347074049/963470167327772703You can find Ken on LinkedIn: https://www.linkedin.com/in/ken-hamric-016b1420/You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_k
Di 31.01.2023
31 : 38 min
Slight Reliability Episode 40 - Drowning in an Observability Data Lake
In this episode Stephen explores the pros and cons of centralising observability data. Is it a practical to stand up a complex and costly data storage and retrieval solution? Is there another way?You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 24.01.2023
11 : 24 min
Slight Reliability Episode 39 - The Future of SRE with Adriana Villela and Ana Margarita Medina
This week I am joined by Ana Margarita Medina and Adriana Villela, the hosts of the On-Call Me Maybe podcast, to discuss what we'd like to see for SRE in 2023. We talk about observability, SRE recruitment, what organisations need in place to set SRE up for success, and much more.You can find the On-Call Me Maybe podcast on most podcast platforms or go directly to the website here: https://oncallmemaybe.com/Twitter: https://twitter.com/oncallmemaybeMastodon: https://mastodon.social/@oncallmemaybeYou can find Adriana on:LinkedIn: https://www.linkedin.com/in/adrianavillela/Twitter: https://twitter.com/adrianamvillelaMastodon: @[email protected]: https://adri-v.medium.com/ You can find Ana on:LinkedIn: https://www.linkedin.com/in/anammedina/Twitter: https://twitter.com/Ana_M_MedinaMastodon: @[email protected] can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Di 17.01.2023
42 : 26 min
Slight Reliability Episode 38 - SRE Reading
To begin 2023 I share the books I read last year in my quest to be a better SRE.Here is a list of all the books mentioned during the episode:The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford https://www.amazon.com/Phoenix-Project-DevOps-Helping-Business/dp/0988262592Site Reliability Engineering (by Google) https://sre.google/sre-book/table-of-contents/Sooner, Safer, Happier by Jonathon Smart https://soonersaferhappier.com/book/The Toyota Way by Jeffrey Liker https://www.amazon.com/Toyota-Way-Second-Management-Manufacturer/dp/1260468518Remote: Office Not Required by Jason Fried https://www.amazon.com/Remote-Office-Required-Jason-Fried/dp/0091954673Driving Digital Strategy by Sunil Gupta https://www.amazon.com/Driving-Digital-Strategy-Reimagining-Business/dp/163369268XTeam Topologies by Matthew Skelton and Manuel Pais https://teamtopologies.com/bookAccelerate by Nicole Forsgren, Jez Humble, and Gene Kim https://www.amazon.com/Accelerate-Software-Performing-Technology-Organizations/dp/1942788339Th
Mo 09.01.2023
10 : 29 min
Slight Reliability Episode 37 - Observability New Year's Resolutions with Henrik Rexed
This week Henrik Rexed and Stephen Townshend discuss their New Year's resolutions for observability. They cover OpenTelemetry and a unified query language, continuous profiling, raw data analysis, instrumenting code, using distributed tracing as part of testing, and much more.Some of the tools or resources mentioned during the episode include:https://tracetest.io/ (distributed tracing for testing)https://github.com/open-telemetry/opamp-go (OTEL orchestration)https://ebpf.io/ (for continuous profiling)You can find Henrik on LinkedIn: https://www.linkedin.com/in/hrexed/ and Twitter: https://twitter.com/HrexedYou can find the Is It Observable? series on YouTube: https://www.youtube.com/@IsitObservableAnd the Perfbytes Podcast on most podcast platforms: https://www.perfbytes.com/p/perfbytes.htmlYou can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Mo 19.12.2022
45 : 46 min
Slight Reliability Episode 36 - Starting an SRE Team from Scratch with Gwen Berry and Steve Gill
This week we talk to Steve Gill and Gwen Berry from IAG to discuss their experiences forming an SRE incubator team (starting SRE from scratch in a large enterprise). We discuss on-call, SLOs, single pane of glass, pivoting, chaos engineering, and much more.You can find Steve on LinkedIn: https://www.linkedin.com/in/stevegill239/You can find Gwen on LinkedIn: https://www.linkedin.com/in/gwen-berry-56324418b/You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Mo 12.12.2022
28 : 21 min
Slight Reliability Episode 35 - SRE Trends from re:Invent 2022
This week I share the observations I made at AWS re:Invent relating to SRE work including the lack of SREs at the event, data warehouses for observability data, the use of topologies to understand complexity, FinOps, serverless, making sense of enormous amounts of data... and more.You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Mo 05.12.2022
15 : 39 min
Slight Reliability Episode 34 - What is Observability? (Live at re:Invent)
This week I was at the AWS re:Invent conference in Las Vegas, so I took the opportunity to walk around the expo asking observability vendors what their perspective or definition of "observability" was (and reflected on that).You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sre
Mi 30.11.2022
8 : 24 min
Slight Reliability Episode 33 - The Many Faces of SRE
In this episode I explore the different kinds of SRE out there and the different needs they fill in the industry, and discuss some ethically dubious practices around hiring SREs.You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 21.11.2022
13 : 33 min
Slight Reliability Episode 32 - Social Reliability Engineering with Kyle Forster and Shea Stewart
In this episode I chat to Kyle Forster and Shea Stewart from RunWhen about the concept of "social reliability engineering" and how it could help SREs from organisations all over the world create an ecosystem of sharing and collaboration.You can find Kyle on LinkedIn: https://www.linkedin.com/in/kyforster/You can find Shea on LinkedIn: https://www.linkedin.com/in/sheastewart/To find out more about RunWhen: https://www.runwhen.com/And an example of the "street map view" of a tech stack: https://www.youtube.com/watch?v=SOvH9lcgCXg You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 14.11.2022
45 : 28 min
Slight Reliability Episode 31 - I Still Wanna Know What SRE Is!
In this episode I reflect back on the very first episode of Slight Reliability "What the heck is SRE anyway?" and see if my perspective has changed since then. I also tackle the confusion about what SRE is and is not.Shout out to Sebastian Vietz (https://www.linkedin.com/in/sebastianvietz/) for his "Service Reliability Engineering" terminology and Richard Benwell (https://www.linkedin.com/in/richard-benwell-ab887b11/) for highlighting the way SRE offers a different value proposition depending on the scale of the services in question. You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 07.11.2022
9 : 45 min
Slight Reliability Episode 30 - A Change of Pace
In this episode I announce my new role as Developer Advocate (SRE) at SquaredUp, and what this means for the Slight Reliability podcast.You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 31.10.2022
7 : 28 min
Slight Reliability Episode 29 - Team Topologies
In this episode I give a summary of the book Team Topologies by Matthew Skelton and Manual Pais (https://teamtopologies.com/book) and how this relates to implementing SRE practices.(POINT OF CORRECTION: One of the authors is "Matthew" Skelton, not "Michael")You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 24.10.2022
17 : 47 min
Slight Reliability Episode 28 - State of DevOps 2022
In this episode I give my take on the Accelerate State of DevOps 2022 from the SRE perspective.You can find the Accelerate State of DevOps Report 2022 here: https://cloud.google.com/devops/state-of-devops/You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
In this episode I share my experience relapsing into anxiety and insomnia, ruminate on an SRE's sphere of influence, and tease an upcoming change of role.You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 10.10.2022
13 : 55 min
Slight Reliability Episode 26 - The Toyota Way
In this episode I reflect on the book "The Toyota Way" by Jeffrey Liker, and explore four principles which resonate with my work. The book in question is The Toyota Way: https://www.amazon.com/Toyota-Way-Second-Management-Manufacturer/dp/1260468518 You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
In this episode I discuss the concept behind continuous delivery and share the ideas we've been exploring at IAG.The book I mentioned is The Toyota Way: https://www.amazon.com/Toyota-Way-Second-Management-Manufacturer/dp/1260468518 You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 19.09.2022
9 : 25 min
Slight Reliability Episode 24 - Interview with Abby Bangser
In this episode I have a chat with Bangser about the transition from testing to SRE, the barriers thrown in front of testers (which SREs don't tend to face), being humble to be let in the door, and *much* more. You can find Abby on LinkedIn: https://www.linkedin.com/in/abbybangser/The book she mentioned was Infrastructure as Code by Kief Morris https://www.thoughtworks.com/insights/books/infrastructure-as-code-2nd-editionYou can find Chastity Majors (cofounder of Honeycomb) on Twitter: https://twitter.com/mipsytipsyYou can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 12.09.2022
28 : 35 min
Slight Reliability Episode 23 - Grafana Central
In this episode I share the story of Grafana Central, an observability platform that we've been standing up at IAG.You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 05.09.2022
18 : 47 min
Slight Reliability Episode 22 - It's SLO Going
In this episode I share a talk I did earlier in the year as part of the Grafana User Group APAC. I share our experiences attempting to implement SLOs at IAG, and our reliability benchmarking work which is a great way to get started if SRE is brand new to your organisation.You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 29.08.2022
19 : 17 min
Slight Reliability Episode 21 - Rubik's Kube
In this episode I share experiences and ideas about Kubernetes, and what I learned from speaking to Ruben Hakopiean from Kubevious. I'd like to give a huge shout out to Ruben. Many of the topics and ideas discussed come straight from what was discussed in the interview we recorded (but were unable to publish due to audio issues).You can find Ruben on LinkedIn: https://www.linkedin.com/in/rubenhak/And find out more about Kubevious here: https://kubevious.io/You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 22.08.2022
12 : 08 min
Slight Reliability Episode 20 - Interview with Joey Hendricks
In this episode I have a chat with Joey Hendricks about running performance tests in production.You can find Joey on LinkedIn: https://www.linkedin.com/in/joey-hendricks/And GitHub: https://github.com/JoeyHendricksYou can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
In this episode I share my takeaways from the NZ DevOps Summit held in Auckland. This was the first in-person event I had attended in three years.You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 08.08.2022
12 : 04 min
Slight Reliability Episode 18 - Interview with Chris Evans
In this episode I have a chat with Chris Evans from incident.io about using incidents to lift the lid on an organisation, how aiming for zero incidents can stall an organisation, how tracking MTTR is unhelpful, and much more.You can find Chris on LinkedIn: https://www.linkedin.com/in/evnsio/Here are the resources Chris mentioned...The practical guide to incident management:http://incident.io/guideThe Field Guide to Understanding Human Error (by Sidney Dekker) https://www.oreilly.com/library/view/the-field-guide/9781317031833/Moving Past Shallow Incident Data by John Allspaw https://www.adaptivecapacitylabs.com/blog/2018/03/23/moving-past-shallow-incident-data/And the SREcon talk he mentioned by Courtney Nash: https://www.usenix.org/conference/srecon22americas/presentation/nashYou can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:ht
Mo 01.08.2022
31 : 49 min
Slight Reliability Episode 17 - Interview with Ganesh Datta
In this episode I have a chat with Ganesh Datta, CTO and co-founder of Cortex.io. In this episode we discuss the human challenges of microservices, gamifying reliability, connecting business outcomes with SRE work, and much more.You can find Ganesh on LinkedIn: https://www.linkedin.com/in/gsdatta/You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 18.07.2022
27 : 25 min
Slight Reliability Episode 16 - Interview with Sebastian Vietz
In this episode I have a chat with Sebastian Vietz, an SRE lead based in Canada who has been leading the implementation of SRE across different teams and organisations for eight years. In this episode we discuss SLO adoption, SRE going mainstream, virtual teams, and many other topics.You can find Sebastian on LinkedIn: https://www.linkedin.com/in/sebastianvietz/You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 11.07.2022
41 : 14 min
Slight Reliability Episode 15 - SLObro
In this episode I discuss potential pre-requisites that are ideally in place before attempting to adopt SLOs.You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 04.07.2022
11 : 41 min
Slight Reliability Episode 14 - SLOpoke
In this episode I share my updated thinking on SLOs and an "ah-ha!" moment I had.You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 27.06.2022
11 : 01 min
Slight Reliability Episode 13 - I Guess You'll Have to Latency
What is latency and how does it relate to customer experience? Where do you measure it? Why do the metrics we choose to capture matter?You can find me on:LinkedIn: https://www.linkedin.com/in/stephentownshend/Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 13.06.2022
16 : 48 min
Slight Reliability Episode 12 - SLO vs NFR Grudge Match!
When it comes to reliability SLO's and NFR's are both *somewhat* related in that they allow us to describe the level of service we want to provide our customers. So how do they match up head to head?You can find me on:LinkedIn: https://www.linkedin.com/in/stephento...Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 06.06.2022
10 : 18 min
Slight Reliability Episode 11 - The Era of Errors
What is an error? Where do you measure errors? How do they relate to SLO's and error budgets? You can find me on:LinkedIn: https://www.linkedin.com/in/stephento...Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 30.05.2022
12 : 31 min
Slight Reliability Episode 10 - Single Pain of Glass
In this episode we discuss the observability concept of a 'single pane of glass' view, and I share my experience implementing one.You can find me on:LinkedIn: https://www.linkedin.com/in/stephento...Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 23.05.2022
16 : 35 min
Slight Reliability Episode 9 - Thoughts from SLOconf 2022
In this episode I share my thoughts from SLOconf, a conference all about Service Level Objectives (SLO's).You can find all the talks (actually 60 of them!) from SLOconf here: https://www.youtube.com/watch?v=pgZm2Bp2-AQ&list=PLLNq9CBV7AFwkXvYmjPPIQlRDVwTmacEKYou can find me on:LinkedIn: https://www.linkedin.com/in/stephento...Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
In this episode I provide three more (p)re-reactions to upcoming sessions at o11yfest 2022 (a conference all about observability). The talks I cover are:"Obserability driven development" by Jessica Kerr"Return on investment driven observability" by Michael Hausenblas"How the OpenTelemetry Collector puts you in the driver seat" by Alex BotenI am also speaking at o11yfest. You can watch my talk on Bad Observability at o11yfest from May 9th to the 12th: https://o11yfest.org/You can find me on:LinkedIn: https://www.linkedin.com/in/stephento...Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
In this episode I provide a (p)re-reaction to three of the talks that will be included in the upcoming 2022 o11yfest conference (a conference all about observability). The talks I cover are:"Where the heck are my spans?" by Reese Lee"Confidence in chaos" by Narmatha Bala"Is MTTR still relevant in a modern, cloud native world?" by Martin MaoI am also speaking at o11yfest. You can watch my talk on Bad Observability at o11yfest from May 9th to the 12th: https://o11yfest.org/You can find me on:LinkedIn: https://www.linkedin.com/in/stephento...Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 02.05.2022
17 : 31 min
Slight Reliability Episode 6 - Afailability
How do you measure the availability of your services? What metric do you pick? What layer of the solution do you track it from?My colleague Gwen and I are sharing our SLO definition workshop experience at SLOconf from May 9th to 12th: https://www.sloconf.com/You can also watch my talk on Bad Observability at o11yfest from May 9th to the 12th: https://o11yfest.org/You can find me on:LinkedIn: https://www.linkedin.com/in/stephento...Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!).Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 11.04.2022
10 : 54 min
Slight Reliability Episode 5 - SLO Motion
In the episode I share our team's experience defining SLO's, and how we experimented and pivoted to achieve better outcomes.As discussed in the episode, if you would like to hear (and see) more about our SLO workshop, my colleague Gwen and I are speaking at SLOconf from May 9th to 12th: https://www.sloconf.com/You can find me on:LinkedIn: https://www.linkedin.com/in/stephento...Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!). Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 04.04.2022
13 : 39 min
Slight Reliability Episode 4 - Bad Observability Part 3
What are even more antipatterns to avoid in monitoring, alerting, tracing, and logging?Shout out to James Pulley for his contribution to this episode. James is one of the world's leading experts on performance engineering and can be found on LinkedIn here: https://www.linkedin.com/in/jameslpulley3/I will be presenting about Bad Observability at o11yfest from May 9th to the 12th 2022: https://o11yfest.org/You can find me on:LinkedIn: https://www.linkedin.com/in/stephento...Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!). Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 28.03.2022
10 : 48 min
Slight Reliability Episode 3 - Bad Observability Part 2
What are some more antipatterns to avoid in monitoring, alerting, tracing, and logging?Shout out to Raguraman Balasubramanian (https://www.linkedin.com/in/raguraman-balasubramanian-070150108/) for his contribution to this episode.You can find me on:LinkedIn: https://www.linkedin.com/in/stephento...Twitter: https://twitter.com/the_kiwi_sreMusic from Uppbeat (free for Creators!). Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 21.03.2022
16 : 28 min
Slight Reliability Episode 2 - Bad Observability Part 1
What are some antipatterns to avoid in monitoring, alerting, tracing, and logging?Music from Uppbeat (free for Creators!). Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 14.03.2022
17 : 06 min
Slight Reliability Episode 1 - What the heck is SRE anyway?
What is SRE *really* about? How did it start? What do I *want* it to be? What is it being implemented as in the industry?Music from Uppbeat (free for Creators!). Intro:https://uppbeat.io/t/sensho/good-timesLicense code: QBXDSEGNJZY9DDICOutro:https://uppbeat.io/t/mountaineer/voyagerLicense code: 5C0VMTUOULFSRSTM
Mo 07.03.2022
9 : 34 min
Wir verwenden Cookies, um Ihnen uneingeschränkten Service zu gewährleisten. Mit Verwendung der Seite stimmen Sie der Cookie-Nutzung zu.