Dated Mar 22, 2020;
last modified on Sun, 28 May 2023
When building an application, we need to figure out which tools and approaches are the best for the task at hand. Combining tools such that they do what no single tool can do is hard.
The purpose of this book is to build understanding in order to pick the proper tools.
The access patterns for a data system (e.g. database, cache, etc.) motivate different implementations, and thus performance characteristics.
Because many applications have a wide range of requirements, single tools no longer meet all of the data processing and storage needs. It’s up to the application to stitch together different tools.
Despite the behind-the-scenes stitching, the data system may provide certain guarantees, e.g. the cache will be correctly invalidated.
A fault is when a component deviates from its spec, while a failure is when the system no longer provides the required service to the user.
Better to design fault-tolerant systems because we can’t reduce the probability of a fault to zero.
In fault-tolerant systems, one should trigger faults deliberately in order to test the tolerance, e.g. randomly killing individual processes.
However, not all things can be tolerated. For instance, if an attacker gains access to sensitive data, then the damage can’t be undone....
Ongoing maintenance includes: fixing bugs, keeping systems operational, adapting to new platforms, modifying to fit new use cases, etc.
Operability: Making Life Easy for Operations Operations teams keep a software system running smoothly. A data system can make life easier for Ops people by:
Providing visibility into runtime behavior and system internals. Supporting automation and integration with standard tools. Being tolerant of machines being down for maintenance. Good documentation, e....
Describing Load Load parameters are numbers that describe the load on a system, e.g. number of requests per second to a web server, ratio of reads to writes in a database, number of simultaneously active users in a chat room.
Example: Twitter Major operations:
Users posting tweets: \(4.6k\) requests/sec on average, \(12k+\) requests/sec at peak. Users viewing timelines: \(300k\) requests/sec. Twitter’s scaling challenge is the fan-out: each user follows many people and each user is followed by many people....