These 2 commands should return the exact same number you do not need to multiply the count by number of replicas etc... the _count returns the number of document's irregardless of the replicas.
I picked a random index not the document count is 10034437 in both.
GET /_cat/indices/filebeat-7.15.2-2022.05.16-000168?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open filebeat-7.15.2-2022.05.16-000168 qIO3tHowQtyIytZmqC62Lw 1 1 10034437 0 8.5gb 4.2gb
GET /filebeat-7.15.2-2022.05.16-000168/_count
{
"count" : 10034437,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
Can you share the output of the cat shards API for the index as well as the mappings for it? Are you by any chance using nested mappings?
If I remember correctly the count API returns the number of documents irrespective of the number of nested documents while the cat API includes nested documents as these count against the lucene shard limit. If this is the case and your documents mostly have 1 nested document but some have a higher or lower value, that may explain the difference.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.