On Distributed Systems

Mergeable Replicated Data Types

On a distributed system, each replica should [eventually] converge to the same state. Commutative Replicated Data Types (CRDTs) can accept updates and achieve consistent without remote synchronization.

The Need for Commutativity

Say we have a queue $ 1 \to 2 $. Suppose two replicas, $r_1$ and $r_2$, independently call pop(). Each replica will have $2$ on their queue.

However, on receiving an update that the other replica popped, each replica will call pop() to be consistent, thereby deleting $2$.

MRDTs extend the CRDT idea to support non-commutative operations.

3-Way Merge for the Set Data Structure

Suppose we start with $s_{LCA} = \{1, 2, 3\} $
At replica 1, we pop $3$ to have $ s_1 = \{1, 2\} $
At replica 2, we pop $1$ and add $4$, to have $ s_2 = \{2, 3, 4\} $
Notice that the intent is to remove $3$, remove $1$ and add $4$ to the set. The merge logic is:

$$ = (s_{LCA} \cap s_1 \cap s_2 ) \cup (s_1 - s_{LCA}) \cup (s_2 - s_{LCA}) $$ $$ = (\{1, 2, 3\} \cap \{1, 2\} \cap \{2, 3, 4\}) \cup ( \{1, 2\} - \{1, 2, 3\} ) \cup ( \{2, 3, 4\} - \{1, 2, 3\}) ) $$ $$ = \{2\} \cup \{ \varnothing \} \cup \{ 4 \} $$ $$ = \{2, 4\} $$

The catch is that we need to send more information over the network. We need the least common ancestor, and diffs in order to recreate the inputs to the merge formula.

Extending the Set Merge Formula

The 3-way merge for set data structures extends to any data structure that can be expressed in the set domain.

For example, a list $v = [1, 2, 3]$ can be represented in set domain as two relations:

$$ R_{membership}(v) = \{ 1, 2, 3 \} $$ $$ R_{occurs-before}(v) = \{ (1, 2), (1, 3), (2, 3) \} $$

Remarks on MRDTs

Because the merge formula ignores the operations themselves, it won’t work if the operations need pre-conditions to be met before executing.

References

Mergeable Replicated Data Types. Gowtham Kaki; Swarn Priya; KC Sivaramakrishnan; Suresh Jagannathan. kcsrk.info . muratbuffalo.blogspot.com . Oct 10, 2019.