Number of results per shard

We have a basic ES instance, and I'm considering an index that has only one primary shard and one replica shard.

Doing a basic bool query, hits.total turns out to depend on which shard we hit. I get consistently different numbers specifying ?preference=_primary or ?preference=_replica.

The shards are different in some sense, because with _cat I see:

index       shard prirep state   docs   store ip            node
admin_ch-v1 0     r      STARTED 3220 295.2mb x.x.x.x  JO8tqXw
admin_ch-v1 0     p      STARTED 3220   294mb x.x.x.x aCqEzYQ

However, the document count is the same. I also wrote a script to get all the documents specifically from each shard (using preference=f"_only_nodes:xxx") and comparing them, and, modulo a bug in my script, everything is identical.

So... what is going on?

Merging of segments is not coordinated across shards, so even if primary and replica shards hold exactly the same contents their size may differ as they may have merged differently.

That's perfectly fine. My question is why, if the contents are exactly the same, the counts for the same query are different.

Are you making changes to these indices? What is the refesh_interval set to?

Sorry I didn't notice your reply earlier.

I don't see refresh_interval in GET <index>/_settings, so I assume it's the default 1s.

And I don't think it's a matter of heavy write usage: the index receives an average of 2 new documents per day.

Also we got the same two counts, say N and M, for the same query on the two shards, trying it hours apart.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.