A list of the most recently updated pages.
| Oct 4, 2021 | » | Journal Reviews on Fairness
8 min; updated Jun 7, 2026
MetaInstead of changing the data or learners in multiple ways and then see if fairness improves, postulate that the root causes of bias are the prior decisions that generated the training data. These affect (a) what data was selected, and (b) the labels assigned to the examples. They propose the \(\text{Fair-SMOTE}\) (Fair Synthetic Minority Over Sampling Technique) algorithm which (1) removes biased labels (via situation testing: if the model’s prediction for a data point changes once all of the data points' protected attributes are flipped, then that label is biased and the data point is discarded), and (2) rebalances internal distributions such that based on a protected attribute, examples are equal in both positive and negative classes. The method is just as effective in reducing bias as prior approaches, and its models achieve higher recall and F1 performance. Furthermore, \(\text{Fair-SMOTE}\) can simultaneously reduce bias for more than one protected attribute. ... |
| Aug 18, 2022 | » | Binary Search
2 min; updated Jun 6, 2026
Binary Search on a Non-Decreasing \(f: \mathbb{R} \to \mathbb{R}\)Given a number \(L\) and a non-decreasing function \(f: \mathbb{R} \to \mathbb{R}\), find the greatest \(x\) such that \(f(x) \le L\). To start, there are two numbers \(lo\) and \(hi\), such that \(f(lo) \le L < f(hi)\). Algorithm |
| Jun 6, 2026 | » | Greedy Algorithms
2 min; updated Jun 6, 2026
Sample ProblemYou are given two integer arrays:
Assign cookies such that as many children as possible are satisfied. Each child can receive at most one cookie, and each cookie can be given to only one child. A Greedy Algorithm |
| Jun 7, 2026 | » | AsyncIO in Python
3 min; updated Jun 7, 2026
The Event LoopThe event loop contains a collection of jobs to be run. Some jobs are added
by application code, and others indirectly by |
| Nov 27, 2024 | » | Consistent Hashing
6 min; updated Jun 6, 2026
The term “consistent hashing” makes me think of hashing without randomization. Why isn’t every hash consistent by definition? For example, a map implementation would need consistent hashing lest it’s inaccurate when searching for stored values. Or is consistent hashing a tradeoff between collision-resistance and speed? Web CachingWeb caching was the original motivation for consistent hashing. With a web cache, if a browser requests a URL that is not in the cache, the page is downloaded from the server, and the result is sent to both the browser and the cache. On a second request, the page is served from the cache without contacting the server. ... |
| Jun 6, 2026 | » | Sharding
1 min; updated Jun 6, 2026
NotesSharding comes up when a single database won’t work (e.g., hit storage limits, write throughput limits, or read throughput limits) and you need to split your data across multiple independent servers. For a user-centric social media app, sharding by |
| Jun 6, 2026 | » | Networking Essentials
2 min; updated Jun 6, 2026
NotesAt a basic level, understand how services talk to each other and what happens when those connections fail or get slow. For 90% of the use cases, default to HTTP over TCP. WebSockets and Server-Sent Events (SSE) come up when you need real-time updates. SSE is unidirectional - the client makes an initial HTTP request to open the connection, and the server pushes data down that connection. WebSockets handle bidirectional communication where both sides send messages freely. ... |
| Jun 6, 2026 | » | Database Indexing
1 min; updated Jun 6, 2026
NotesIndexes make database queries fast. Most relational databases use B-trees, which support exact lookups (find user with email X) and range queries (find all orders between date A and date B). Hash indexes are faster for exact matches but can’t do range queries. There are external systems that go beyond what the primary DB provider has. Elasticsearch is the go-to for full-text search. PostGIS in Postgres is popular for geospatial queries. Such extensions typically sync from your primary DB and so the search index will lag slightly behind. However, the increased functionality is usually worth it. ... |
| Jun 6, 2026 | » | Data Modeling
1 min; updated Jun 6, 2026
NotesChoosing what data to store and how to structure it directly affects performance, scalability, and maintenance. Relational databases are useful when you have structured data with clear relationships and need strong consistency (transaction-based actions, enforcing foreign key constraints). NoSQL databases shine for flexible schemas or when you need to scale horizontally across many servers without complex joins. Normalization refers to splitting data across tables to avoid duplication, e.g.,
you have a |
| Jun 6, 2026 | » | CAP Theorem
1 min; updated Jun 6, 2026
NotesThe CAP theorem states that you can only have 2 of 3 properties at once:
In practice, network partitions are unavoidable in distributed system. Choosing consistency means some nodes will refuse to serve requests rather than return potentially stale data. Choosing availability means different nodes might temporarily have different data. ... |
| Jun 6, 2026 | » | Caching
2 min; updated Jun 6, 2026
NotesFor read-heavy applications, storing frequently accessed data in fast memory (e.g., Redis) allows you to skip the DB entirely for some reads. A cache hit on Redis takes ~1ms compared to 20-50ms for a typical DB query, and this speedup is impactful in the order of millions of requests. Caching requires a solution for invalidating stale data, e.g., user updates their profile in the DB. Strategies include invalidating the cache immediately after writes, using short TTLs, etc. ... |
| Jun 6, 2026 | » | API Design
1 min; updated Jun 6, 2026
NotesFor 90% of interviews, default to REST, which maps resources to URLs and uses
HTTP methods to manipulate them, e.g., When returning large result sets, pagination comes into play. Cursor-based pagination works better for real-time data where new items get added frequently. Offset-based pagination is fine for most cases. How does cursor-based pagination beat offset-based pagination for new additions? Aren’t new additions invisible to inflight paginations? ... |
| May 31, 2026 | » | Delivery Framework for System Design Interviews
3 min; updated Jun 6, 2026
Requirements (~5min)Functional RequirementsCompletions for “Users/Clients should be able to…”, e.g., for Twitter: post tweets, follow other users, and see tweets from users they follow. Keep the list targeted and prioritized (e.g., top 3) as your job is to develop a system that meets those requirements. Non-Functional RequirementsCompletions for “The system should be…”, e.g., for Twitter:
|
| Jun 6, 2026 | » | Of [Non]Overlapping Intervals
3 min; updated Jun 6, 2026
Merge Overlapping IntervalsProblem Statement: Merge Overlapping IntervalsGiven an integer array Solution: Sort by Start Time and then Linear ScanLike
Minimum Size Subarray Sum
, contiguity hints at a linear scan being useful. If we sort |
| Jun 5, 2026 | » | Print Zero Even Odd
2 min; updated Jun 5, 2026
Problem
SolutionUse a |
| Jun 4, 2026 | » | Print FooBar Alternately
4 min; updated Jun 4, 2026
Problem StatementYou’re given the … and the same instance of SolutionTwo |
| Jun 3, 2026 | » | Earliest Finish Time for Land and Water Rides
2 min; updated Jun 3, 2026
Problem StatementThere are \(L\) possible land rides and \(W\) possible water rides, with \(1 \le L, W \le 5 \times 10^4\). A tourist must experience exactly one ride from each category, in either order. A ride may be started at its opening time or any later moment. Immediately after finishing one ride, the tourist may board the other if it’s already open or wait until it opens. Return the earliest possible time at which the tourist can finish both rides. ... |
| Jun 2, 2026 | » | Thread-Based Parallelism in Python
7 min; updated Jun 2, 2026
In Chromium, why wasn’t explicit threading a thing? Data structures are usually
not thread-safe. There were guardrails for scenarios like checking IntroductionThreads are smaller units of a process that allow executing tasks in parallel, sharing memory space. Threads are particularly useful for I/O bound tasks, where much of the time is spent waiting for external resources. ... |
| May 31, 2026 | » | Daredevil: Cold Day in Hell
3 min; updated May 31, 2026
MATT. “All part of God’s plan.” What a beautiful phrase. Kind of covers it all.
Dark. Light. Don’t get yourself bothered. It’s all God’s plan. Nothing
you could have done about it in any case. I understand the attraction. It’s a
nice bit of hand-waving. A catchall that pushes aside the necessity for deeper
examination of responsibility. MATT. Rage. It’s everywhere in this city. People are terrified. Of what’s
happened, of what might happen. Everyone is everyone’s enemy. Anyone who
needs help is weak. They can’t let themselves be weak. Not in this
world. They tell themselves they’re strong. They’re just afraid. |
| Jun 1, 2026 | » | Print in Order
1 min; updated Jun 1, 2026
Suppose we have a class: … and the same instance of See Thread-Based Parallelism - Python for a quick primer on Python’s synchronization primitives. ... |
In my attempt at writing
...binary_search, I had: