Mergeable Replicated Data Types
On a distributed system, each replica should [eventually] converge to the same state. Commutative Replicated Data Types (CRDTs) can accept updates and achieve consistent without remote synchronization.
The Need for Commutativity
Say we have a queue \( 1 \to 2 \). Suppose two replicas, \(r_1\) and \(r_2\), independently call pop()
. Each replica will have \(2\) on their queue.
However, on receiving an update that the other replica popped, each replica will call pop()
to be consistent, thereby deleting \(2\).
MRDTs extend the CRDT idea to support non-commutative operations.
3-Way Merge for the Set Data Structure
- Suppose we start with \(s_{LCA} = \{1, 2, 3\} \)
- At replica 1, we pop \(3\) to have \( s_1 = \{1, 2\} \)
- At replica 2, we pop \(1\) and add \(4\), to have \( s_2 = \{2, 3, 4\} \)
- Notice that the intent is to remove \(3\), remove \(1\) and add \(4\) to the set. The merge logic is:
$$ = (s_{LCA} \cap s_1 \cap s_2 ) \cup (s_1 - s_{LCA}) \cup (s_2 - s_{LCA}) $$ $$ = (\{1, 2, 3\} \cap \{1, 2\} \cap \{2, 3, 4\}) \cup ( \{1, 2\} - \{1, 2, 3\} ) \cup ( \{2, 3, 4\} - \{1, 2, 3\}) ) $$ $$ = \{2\} \cup \{ \varnothing \} \cup \{ 4 \} $$ $$ = \{2, 4\} $$
Extending the Set Merge Formula
The 3-way merge for set data structures extends to any data structure that can be expressed in the set domain.
For example, a list \(v = [1, 2, 3]\) can be represented in set domain as two relations:
$$ R_{membership}(v) = \{ 1, 2, 3 \} $$ $$ R_{occurs-before}(v) = \{ (1, 2), (1, 3), (2, 3) \} $$
Remarks on MRDTs
Because the merge formula ignores the operations themselves, it won’t work if the operations need pre-conditions to be met before executing.
The catch is that we need to send more information over the network. We need the least common ancestor, and diffs in order to recreate the inputs to the merge formula.