I have a "products" index with ~750.000 documents. When displaying the products I allow the users to sort them by price ( asc, desc ) or 1-2 other metrics.
The prices are constantly updated, as well as products constantly being added and removed ( 2-4 times per day ).
Is it beneficial to sort the index by one of the metrics, like price asc in this case ? Or is the cost of rearranging the whole index with each new change too big for the benefits it provides ?
I have read about "Index sorting" on ElasticSearch's "Tune for search speed", but I'm not sure what are the implications on an index that gets constantly updated.
What about the most basic index that stores only 1 field (name) as keyword, but which also gets constantly updated. Is it beneficial to sort it, so that the index doesn't have to look through all documents each time a name is requested ?
Storing sorted values on disk requires a lot more work at index time from Elasticsearch than storing unsorted values. In some cases the performance overhead of index sorting can decrease write performance by as much as 40-50%. For this reason it is very important to determine if the application should be optimized for query performance or write performance. Optimizing an application for write performance (and taking the hit on query performance) will most likely mean index sorting is not a good option.
Things might have changed since then but I would recommend running a benchmark to see if the indexing overhead is small compared to the potential query performance gain.
From my understanding nothing has changed from the introduction. I'll read the blog mentioned and run some benchmarks to see how this impacts things. Thanks!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.