I am indexing about 17w sentences, and I use bulk. When I indexed use single thread, it end up about 150M, but when I tried use multi thread, it became 110M, is this possible? I used the count api checked, the total number of document is the same.
Is there any way I can find the different between this two index?
The size of the index can vary depending on how segments have or have not merged. If you are indexing at a higher speed using multiple threads it is possible that the initial segments will be larger and therefore merge differently.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.