| Random Link ¯\_(ツ)_/¯ | ||
| Nov 27, 2024 | » | Consistent Hashing
6 min; updated Jun 6, 2026
The term “consistent hashing” makes me think of hashing without randomization. Why isn’t every hash consistent by definition? For example, a map implementation would need consistent hashing lest it’s inaccurate when searching for stored values. Or is consistent hashing a tradeoff between collision-resistance and speed? Web CachingWeb caching was the original motivation for consistent hashing. With a web cache, if a browser requests a URL that is not in the cache, the page is downloaded from the server, and the result is sent to both the browser and the cache. On a second request, the page is served from the cache without contacting the server. ... |
| Jun 6, 2026 | » | Sharding
1 min; updated Jun 6, 2026
NotesSharding comes up when a single database won’t work (e.g., hit storage limits, write throughput limits, or read throughput limits) and you need to split your data across multiple independent servers. For a user-centric social media app, sharding by |
| Jun 6, 2026 | » | Networking Essentials
2 min; updated Jun 6, 2026
NotesAt a basic level, understand how services talk to each other and what happens when those connections fail or get slow. For 90% of the use cases, default to HTTP over TCP. WebSockets and Server-Sent Events (SSE) come up when you need real-time updates. SSE is unidirectional - the client makes an initial HTTP request to open the connection, and the server pushes data down that connection. WebSockets handle bidirectional communication where both sides send messages freely. ... |
| Jun 6, 2026 | » | Database Indexing
1 min; updated Jun 6, 2026
NotesIndexes make database queries fast. Most relational databases use B-trees, which support exact lookups (find user with email X) and range queries (find all orders between date A and date B). Hash indexes are faster for exact matches but can’t do range queries. There are external systems that go beyond what the primary DB provider has. Elasticsearch is the go-to for full-text search. PostGIS in Postgres is popular for geospatial queries. Such extensions typically sync from your primary DB and so the search index will lag slightly behind. However, the increased functionality is usually worth it. ... |
| Jun 6, 2026 | » | Data Modeling
1 min; updated Jun 6, 2026
NotesChoosing what data to store and how to structure it directly affects performance, scalability, and maintenance. Relational databases are useful when you have structured data with clear relationships and need strong consistency (transaction-based actions, enforcing foreign key constraints). NoSQL databases shine for flexible schemas or when you need to scale horizontally across many servers without complex joins. Normalization refers to splitting data across tables to avoid duplication, e.g.,
you have a |
| Jun 6, 2026 | » | CAP Theorem
1 min; updated Jun 6, 2026
NotesThe CAP theorem states that you can only have 2 of 3 properties at once:
In practice, network partitions are unavoidable in distributed system. Choosing consistency means some nodes will refuse to serve requests rather than return potentially stale data. Choosing availability means different nodes might temporarily have different data. ... |
| Jun 6, 2026 | » | Caching
2 min; updated Jun 6, 2026
NotesFor read-heavy applications, storing frequently accessed data in fast memory (e.g., Redis) allows you to skip the DB entirely for some reads. A cache hit on Redis takes ~1ms compared to 20-50ms for a typical DB query, and this speedup is impactful in the order of millions of requests. Caching requires a solution for invalidating stale data, e.g., user updates their profile in the DB. Strategies include invalidating the cache immediately after writes, using short TTLs, etc. ... |
| Jun 6, 2026 | » | API Design
1 min; updated Jun 6, 2026
NotesFor 90% of interviews, default to REST, which maps resources to URLs and uses
HTTP methods to manipulate them, e.g., When returning large result sets, pagination comes into play. Cursor-based pagination works better for real-time data where new items get added frequently. Offset-based pagination is fine for most cases. |
| May 31, 2026 | » | Delivery Framework for System Design Interviews
3 min; updated Jun 6, 2026
Requirements (~5min)Functional RequirementsCompletions for “Users/Clients should be able to…”, e.g., for Twitter: post tweets, follow other users, and see tweets from users they follow. Keep the list targeted and prioritized (e.g., top 3) as your job is to develop a system that meets those requirements. Non-Functional RequirementsCompletions for “The system should be…”, e.g., for Twitter:
|
| Jul 19, 2025 | » | Resilient App Development in .NET
6 min; updated Jul 19, 2025
Build a Resilience PipelineGiven a Other Using |
| May 17, 2020 | » | Mergeable Replicated Data Types
2 min; updated Jul 19, 2025
On a distributed system, each replica should [eventually] converge to the same state. Commutative Replicated Data Types (CRDTs) can accept updates and achieve consistent without remote synchronization. The Need for CommutativitySay we have a queue \( 1 \to 2 \). Suppose two replicas, \(r_1\) and \(r_2\), independently call However, on receiving an update that the other replica popped, each replica will call |
| Mar 16, 2017 | » | Designing Data-Intensive Applications [Book]
(7 items)
Relational Model Versus Document Model; Thinking About Data Systems; Designing Data-Intensive Applications [Kleppmann, Martin]; Query Languages for Data; Reliability; Maintainability; Scalability; |
How does cursor-based pagination beat offset-based pagination for new additions? Aren’t new additions invisible to inflight paginations?
...