In our organization, we have a 750GB index (1.5TB with 1:1 replication, 5 nodes, 5 primary shards) setup on ES 6.2. Number of documents is ~ 90 million. Frequent updates are made to the index in prod environment on daily basis via automation. We are using function score queries to boost the user's relevant search results based on some Machine learned parameters. However we are taking a huge hit in search performance and considering to go with a bigger new cluster hoping that might help. Some guidance on the nodes, shards and hardware requirements for such a huge index would really help. Currently a search on this document takes more than 5 seconds making it completely unusable.
Function score can result in a lot of processing if there are a lot of matching documents to score. When you run a query in Elasticsearch a single thread will process each shard although the shards will be processed in parallel. As you only have 5 quite large primary shards, you will basically primarily use 5 threads across the cluster to serve each query. Unless you have set up your index to enable the split index API it may in this scenario it may very well make sense to reindex the data into a new index with a higher number of primary shards in order to increase the concurrency.
Before you do so I would however recommend verifying that you are not limited by disk I/O, which may not improve with increased number of primary shards.
Thanks for replying! We are using virtual hard disks powered by VMDK on nodes with 16GB RAM. Also we are planning to reindex on a new cluster with 20-30GB data per primary shard, to account for data growth in future. I would like to not maintain ai single a huge 750GB index but not being able to figure out how to control the size. Since our data is not time based, logs or event based and we make frequent updates to the data, rollover probably doesn't work? Is there anyway to manage size of an index that is writable
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.