Reliability at Scale – How to build graceful degradation into applications at massive scale
Everyone thinks their system is reliable until suddenly it isn’t. With a service oriented or microservice based architecture, it can be easy to overlook a single point of failure that can cause a massive cascading failure, quickie harming users and losing their trust. Ensuring that a single failure is handled gracefully is of paramount importance in a large scale distributed system, but that’s much easier said than done. I’ve worked for multiple companies building large […]