Scalability

Describing Load

Load parameters are numbers that describe the load on a system, e.g. number of requests per second to a web server, ratio of reads to writes in a database, number of simultaneously active users in a chat room.

Example: Twitter

Major operations:

Users posting tweets: \(4.6k\) requests/sec on average, \(12k+\) requests/sec at peak.
Users viewing timelines: \(300k\) requests/sec.

Twitter’s scaling challenge is the fan-out: each user follows many people and each user is followed by many people.

Intially, when a user tweeted, the tweet was put in a global collection. When a user requested their timeline, some nasty SQL went down. But the systems choked.

Now, each user’s timeline is cached like a mailbox. When a user tweets, the tweet is ‘mailed’ to all of the tweeter’s caches. Viewing the timeline is then cheap.

Note how the heavy work needed to be done. The challenge is figuring out when to do it.

However, tweets from celebrities are fetched when a user loads their timeline. Otherwise, too many writes would need to be done when highly followed accounts tweet.

Describing Performance

Response time: time between a client sending a request and receiving a response. Think of it as a distribution because there are random factors, e.g. context switching, packet loss, garbage collection, page fault, etc.

The mean is not a good metric because it doesn’t say how many users experienced a certain response time. Better to use percentiles.

The high percentiles are usually important because these are probably the most valuable customers, e.g. the ones storing lots of data on your cloud. However, the higher the percentile, the more pronounced random events are.

It’s important to measure the response times on the client side. Say client A and client B both send requests, but A’s in 100x slower and lands first on the server. Both A and B will perceive similar delay times!

Even when server calls are made in parallel, the end-user request still needs to wait for the slowest call.

Approaches for Coping with Load

Scaling up means moving to a more powerful machine, while scaling out means distributing the load across multiple smaller machines.

Some systems automatically adjust computing resources based on load, but manually scaled systems are simpler and have fewer gotchas.

Distributing a stateful system can introduce a lot of complexity. Although the tools for distributed stateful data systems are getting better, the status quo is to keep databases on a single node.

Note that an architecture that scales well for an application is built around assumptions on the relative occurrence of the load parameters.