ES 2.4.0; using mapper-attachments plugin to index pdfs.
Suddenly after index, I see the shards size from primary and replica are different. See attached. The cluster is in green status, I didn't notice exceptions that might affect this.
The same logic I used before: I set replica as 0; after indexing, set replica to 1;
document count is the same; the store has big difference. e.g for shard 3, primary is 11.6gb, but for replica it is only 6.6gb. Is this expected?
that's possible, for example the primary could have a large number of deleted documents which you wouldn't see on the replica when you change the replica count from 0 to 1 after indexing? Documents are only marked as deleted as Lucene segments are immutable. Documents are physically deleted when segments get merged.
A few more thoughts: Did you check with the number of segments with the indices segments API? You could also run a force merge but note that this will cause a lot of I/O so I wouldn't do that during peak hours.
Daniel
P.S.: Please don't post screenshots but rather copy the text here.
Thanks @danielmitterdorfer for replying the message. I got your point. Thanks.
After force merging the segments, now the primary and replica have the same size. Thank you very much.
Interesting, since I am just indexing document and not deleting any documents still the big difference.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.