This is what I call the "Elizabeth Taylor" problem.
She was an actress who famously married many times and would be hard to find in a distributed database if you were looking for people with the most marriages.
Marriages, like your records, preferably only occur once and in most cases this is true. If your system holding marriage certificates spreads them across many shards randomly then each shard could end up thinking Elizabeth Taylor was like everyone else in their subset of data - only married once. This makes it hard for the system as a whole to determine which of the millions of people held on file were married more than once.
If you use custom routing of documents you can ensure each person's documents end up on the same shard and get the accurate answers.
1 Like