Exposing pagination through URLs like example.com/items?page=25&limit=100
leads to SQL like
SELECT * from 'items' LIMIT 100 OFFSET 2400;, which
inefficiently goes through 2,500 records and discards the first 2,400.
With relative cursors, we have URLs like example.com/items?limit=50&lastID=XXX,
which leads to SQL like
SELECT * from 'items' WHERE 'id' > XXX ORDER BY 'id' ASC LIMIT 50, and this is
more efficient, assuming there are indexes for
The important part is including enough information in the URL to form an
efficient SQL query that doesn’t skip/replay records. For example, if
not a primary key, then a URL like example.com/items?limit=50&lastTitle=XXX
will skip/replay records whose title is
XXX but whose relative order is higher
than 50. Combining the title with a primary key is enough to disambiguate such
In CS, substring search is a foundational problem for which there are
\(O(m + n)\) algorithms. But VB’s implementation was \(O(mn)\). Suppose we
InStr("apple banana orange", "banana"):
- Skip to the first
b. Check if
bananais a substring starting from that
bananawas not found, skip to the next
b. Try again, and so forth.
In the worst case, we have no match, and a
query that makes us do a lot of
work before returning false, e.g.
Why would VB go with an \(O(mn)\) algorithm?
skiptofunction is a single x86 machine instruction.
- Most VB devs are querying normal strings, not DNA sequences.
In practice, the brute force algorithm is mostly \(O(n + m)\). By the time preprocessing is done for asymptotic \(O(m + n)\) algorithms, the naive brute force algorithm will have given you the answer.
Commentary from HackerNews
New grads tend to have exposure to new tools. However, change should be motivated by a clear benefit, e.g. python3 has file system improvements and f-strings are more readable. While the users may not benefit, devs will.
Start with simple working implementations. Profile before you write fancier functions. But even with the working implementations, do the basics: cover DB queries with indexes, use maps/sets if frequently looking up, avoid re-renders of UI that doesn’t change, etc.
A galactic algorithm is one that runs faster than any other algorithm for problems that are sufficiently large, but where “sufficiently large” is so big that the algorithm is never used in practice. (en.wikipedia.org)
Develop an understanding for real world data: caches, latencies, optimized instructions, etc. For small \(n\), the constants matter, e.g. how does your hash function fare against linearly searching a flat but contiguous array? What is the costly operation? Moving stuff around might be your undoing…
Know which battles to fight. 50x improvement? If your algorithm now takes 2K ns, but hitting the file system takes billions of ns, then there’s no perceivable benefit to the user. But if you’re going from 50 min to 1 min, awesome!
Pagination with Relative Cursors. Drew Martin. Shopify. shopify.engineering . Aug 12, 2019.