I've noticed a slowdown in indexing speed going from esv 2.3 to 5.5.0.
It's early days, I have no precise figures and there are a tonne of other things that may differ, but my first thought was that it may be a change I made to doc_values- there are more doc_values now. I presume this will slow indexing speed, but I wonder by how much. I'm seeing a doubling of the time at least to index my corpus of text. Any suggestions ?
And I observe a degradation over time. I start off getting tens of thousands of docs loaded per minute and after a while this rate drops ten fold. Stopping and restarting doesn't help and I suspect it's more to do with random IO or seek times as the index grows.
If you do not need to update documents or use the id to ensure you don't get duplicates, letting Elasticsearch assign the document id will give better performance as Elasticsearch knows the documents do not already exist. You can read about the impact of id formats on indexing speed in this blog post.
and the thing is, it's the differential between index speed on 2.3 and 5.5.0 that interests me. The ID style hasn't changed. But, if there's nothing crazy about my process, that's cool, I can go investigate more deeply.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.