We have a basic ES instance, and I'm considering an index that has only one primary shard and one replica shard.
Doing a basic bool query, hits.total turns out to depend on which shard we hit. I get consistently different numbers specifying ?preference=_primary or ?preference=_replica.
The shards are different in some sense, because with _cat I see:
index shard prirep state docs store ip node
admin_ch-v1 0 r STARTED 3220 295.2mb x.x.x.x JO8tqXw
admin_ch-v1 0 p STARTED 3220 294mb x.x.x.x aCqEzYQ
However, the document count is the same. I also wrote a script to get all the documents specifically from each shard (using preference=f"_only_nodes:xxx") and comparing them, and, modulo a bug in my script, everything is identical.
Merging of segments is not coordinated across shards, so even if primary and replica shards hold exactly the same contents their size may differ as they may have merged differently.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.