I've got an index with a static set of documents, with no upsert or delete
operations working on the index, running on a 9 node cluster with 4 master
only nodes and 5 data only nodes. When I query any of the master nodes for
status (/_status), I see a num_docs for my index of 8424426. However, when
I run a search for the count of documents in the index
(/my_index/type/_search?search_type=count) I see a different results - in
fact, I see the total hits flip-flop between two different document counts,
neither of which are 8424426. What I mean by this is that if I run the
count search over and over, the result is not stable - even when no
documents are changing in my index. Of my four masters, each master results
a different hit total for my index, and each master flips between different
numbers. The counts rotate every second or two.
To perhaps be more clear:
/_search on any master always returns 8424426 for num_docs
/my_index/type/_search?search_type=count from master 1 returns either
8427976 or 8428114
/my_index/type/_search?search_type=count from master 2 returns either
8429402 or 8426588
/my_index/type/_search?search_type=count from master 3 returns either
8428076 or 8428014
/my_index/type/_search?search_type=count from master 4 returns either
8427529 or 8428831
I've got other indices in this cluster which don't exhibit this behavior -
the search count always equals the num_docs for other indices.
Any ideas what might be causing this issue? I've tried restarting the
master nodes and have ensured that all shards for the affected index are
accounted for in the cluster. I'm a bit nervous since ES hasn't reported
any errors and says that it's status is green.
We started experiencing a similiar issue today. The replica shards did not have the same document count and each query would return a different result unless we used preference=. However this only hides the problem, we havent figured out why the replicas are no longer in sync. We deleted and rebuilt our index for now, but are watching for it to happen again. Very interested to know what is happening here as well.
It is likely that our flip-flopping count issue is caused by replicas not
being in sync. We have 20 shards and num_replicas set to 1. For most of the
shards, the count in primary and secondary replica is not same. Difference
is between 600-2K documents.
We are using elasticsearch-0.18.7
Any ideas on what might be causing this discrepancy? How can we prevent it
in future? What is the best way to monitor that replicas are in sync?
On Monday, September 10, 2012 9:11:30 PM UTC-7, Kurt Harriger wrote:
We started experiencing a similiar issue today. The replica shards did
not have the same document count and each query would return a different
result unless we used preference=. However this only hides the problem, we
havent figured out why the replicas are no longer in sync. We deleted and
rebuilt our index for now, but are watching for it to happen again. Very
interested to know what is happening here as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.