Hi community
I have a flink job which will consume Kafka message and load into Elasticsearch. In order to monitor the number of documents loaded into Elasticsearch, I write a DSL to get the 'hits total' in Elasticsearch,
GET some-indice/_search
{
"track_total_hits":true
}
Another way is
GET some-indice/_count
{
}
Normally, this will give me an increased 'hits total' value due to stream data . However, When I execute this script frequently, I got a decreased value than previous query occasionally. Is this a normal behavior that Elasticsearch will return a decreased 'hits total' value occasionally? There is no any delete work on Elasticsearch.
wild guess here without any further info: Is this index being constantly indexed into? If so, each of your searches will hit different shards (somtimes the primary shard, sometimes a replica). Each of those shards refreshes its data at different times, so you might end up with different counts.
Hope this helps as a start.
You can also run POST my-index/_refresh and then check via the _cat/shards API if all shards have the same amount of documents - if there is no concurrent indexing going on, otherwise the above counts again.
Also, please always specify your Elasticsearch version. Thanks!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.