Primary and replica are having different size

ES 2.4.0; using mapper-attachments plugin to index pdfs.

Suddenly after index, I see the shards size from primary and replica are different. See attached. The cluster is in green status, I didn't notice exceptions that might affect this.

The same logic I used before: I set replica as 0; after indexing, set replica to 1;

document count is the same; the store has big difference. e.g for shard 3, primary is 11.6gb, but for replica it is only 6.6gb. Is this expected?

Any idea?

Hi @AdaYang,

that's possible, for example the primary could have a large number of deleted documents which you wouldn't see on the replica when you change the replica count from 0 to 1 after indexing? Documents are only marked as deleted as Lucene segments are immutable. Documents are physically deleted when segments get merged.

A few more thoughts: Did you check with the number of segments with the indices segments API? You could also run a force merge but note that this will cause a lot of I/O so I wouldn't do that during peak hours.

Daniel

P.S.: Please don't post screenshots but rather copy the text here.

1 Like

Thanks @danielmitterdorfer for replying the message. I got your point. Thanks.
After force merging the segments, now the primary and replica have the same size. Thank you very much.

Interesting, since I am just indexing document and not deleting any documents still the big difference.

Hi @AdaYang,

Ok, so there are no deletes but did you update the documents in the index maybe?

Daniel

thanks for following up. No... Purely bulk insert... with mapper-attachment indexing pdfs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.