The hot topic that isn't AI: Operational Resilience

The topic of operational resilience has been around for decades, with extensive research following not just the technical aspects of IT failures and cyber-security incidents but also the contributing human factors and organisational behaviours. With all these years of research and review you might think it surprising that it’s such a hot topic again. Like many things in IT, there is a cycle of major incident, learning, new technology trend, and major incident again. From the early days like Y2K and ILOVEYOU, through Heartbleed and WannaCry to the more public AWS and Azure global outages, the pattern is consistent. Ensuring that we have systems in place to prevent mistakes and malicious intent continues to be a priority but the repetition of the large-scale events, not to name the millions of smaller ones, makes clear that prevention is not sufficient. ...

19 April 2026 · 12 min · liamjbennett

Chaos Engineering - fact or fiction

Intro (what even is it?!) I have been a developer and a DevOps engineer (or whatever the latest title is now) for most of my career and I like my systems to be reliable and well architected. I like being confident that when the unexpected happens, and it always does, that the systems that I am responsible for can handle it and not wake me up at 2am. When I talk to developers, I’m often talking about testing and the testing pyramid - we all know it’s value and the positive investment that it is. The more testing of our systems we have the earlier we see bugs, the more reliable the application. It’s a fairly simple statement but its still worth reiterating because of how easy is it to neglect in fast-moving customer/feature driven teams. ...

16 November 2022 · 6 min · liamjbennett