Different Doc count after reindexing

I have an index user1. To add some fields, I created a new index and did reindexing using the query:

POST /_reindex?scroll=30m
{
  "source": {
    "index": "user1"
  },
  "dest": {
    "index": "user2"
  }
}

After reindexing is complete, GET _cat/indices gives the following result for the 2 indices:

yellow open user1       _tg9nbGWSreKBBsa7E0XwQ      5 1 298934 47108   36.7gb   36.7gb 
yellow open user2       ne1m5aWDR5K064qCo0awKw      5 1 314441     0   25.5gb   25.5gb

But _count query on both the indices give the same value.

Why is Docs count more and Storage size less for the new index during GET _cat/indices ?

No idea. May be you had existing data before? Or you changed the mapping and now are using nested documents?

Because you probably have less segments, no updates, better compression...

Assuming mappings are the same it seems like you have updated and/or deleted documents in the user1 index. This takes up space as old or deleted documents are not immediately removed. I suspect this explains the difference in size and count.

There was no existing data. The index was newly created for reindexing.

Yes. I changed the mapping and created few more fields, which includes nested fields also.

Could you please explain how adding nested fields increase the Docs count (while _count query shows the same count for both indices)?

I haven't updated/deleted any documents.

Mappings are not same. I have updated the mapping and added few more fields including nested fields in the new index. How does it cause the difference in count?

Each nested document is indexed as a document in Lucene. That's why you have more documents than in the source index.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.