how.complexsystems.fail

Software

This is such a great webpage, especially in light of the adventures Optus and their customers are having right now. This is my favourite point, because even the tech press routinely get this wrong:

Because overt failure requires multiple faults, there is no isolated ‘cause’ of an accident. There are multiple contributors to accidents. Each of these is necessarily insufficient in itself to create an accident. Only jointly are these causes sufficient to create an accident. Indeed, it is the linking of these causes together that creates the circumstances required for the accident. Thus, no isolation of the ‘root cause’ of an accident is possible. The evaluations based on such reasoning as ‘root cause’ do not reflect a technical understanding of the nature of failure but rather the social, cultural need to blame specific, localized forces or events for outcomes.

See also: James T. Reason’s Swiss cheese model of accident causation. You can blame the human involved in the last slice, but you’re neglecting the rest of the stack that should never have let such a failure result in a catastrophe.

Author bio and support

Me!

Ruben Schade is a technical writer and IaaS engineer in Sydney, Australia who refers to himself in the third person in bios. Wait, not BIOS… my brain should be EFI by now.

The site is powered by Hugo, FreeBSD, and OpenZFS on OrionVM, everyone’s favourite cloud infrastructure provider.

If you found this post helpful or entertaining, you can shout me a coffee or buy some silly merch. Thanks!