Why there are 20 documents diff (cat vs count)?

I try to compare results of count and cat for the amount of documents and I'm getting different results:

GET /my_index/_count gives:

   "count": 4020199,
   "_shards": {
      "total": 1,
      "successful": 1,
      "skipped": 0,
      "failed": 0

And GET /_cat/indices gives:

green open my_index 53IuNvZ1T4W_8u1U4bf7kb 1 1 8040418 0 27gb 13.5gb

When we compare the quantities we get that:

8040418 - 4020199 * 2 (one replica) = 20

Note: I used refresh command and there is no change.

So where are that 20 loss documents ?

Hmm you have something odd going on...

These 2 commands should return the exact same number you do not need to multiply the count by number of replicas etc... the _count returns the number of document's irregardless of the replicas.

I picked a random index not the document count is 10034437 in both.

GET /_cat/indices/filebeat-7.15.2-2022.05.16-000168?v

health status index                             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   filebeat-7.15.2-2022.05.16-000168 qIO3tHowQtyIytZmqC62Lw   1   1   10034437            0      8.5gb          4.2gb

GET /filebeat-7.15.2-2022.05.16-000168/_count

  "count" : 10034437,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0

Can you share the output of the cat shards API for the index as well as the mappings for it? Are you by any chance using nested mappings?

If I remember correctly the count API returns the number of documents irrespective of the number of nested documents while the cat API includes nested documents as these count against the lucene shard limit. If this is the case and your documents mostly have 1 nested document but some have a higher or lower value, that may explain the difference.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.