Hi There,
I am using ES 5.3.0. I have indexed about 2.2M documents in my index, and it seems like count API is returning results that are far too different than _cat/indices API..is there something wrong?
curl localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open index1 6 2 172544 0 384.7mb 128.2mb
green open index2 6 2 2708259 74040 6.8gb 2.1gb
$ curl localhost:9200/index1/_count?pretty
{
"count" : 84916,
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
}
}
$ curl localhost:9200/index2/_count?pretty
{
"count" : 1027782,
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
}
}
Looks like you have 2 replica shards for each of your indices. I would initially guess that the _count API returns the number of primary documents, and the _cat/indices API returns total # of documents including replicas.
That would only make sense if you were still inserting documents in between you _cat/indices query and your _count queries (since 3 X 80K = 240k != 170k; but maybe when you made the _cat/indices query you actually have like 65K primary documents). Is that the case?
We can tell quickly how many shards make up an index, the number of docs at the Lucene level, including hidden docs (e.g., from nested types), deleted docs, primary store size, and total store size (all shards including replicas). All these exposed metrics come directly from Lucene APIs.
It does sound like the cat query returns primary + replica documents (since it's # docs at the Lucene level).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.