Slow elasticsearch search performance

i have a eck operator based Elasticsearch cluster with 4 nodes - each with 6 cpu cores and 16 gb ram configured on eck operator.

Note:

  1. Configured ILM with alias, and have 3 index already created when max size is greater than 30 GB.
  2. Each index has 1 shard, and 3 replicas. so basically my shard size is 30 GB. I am using persistence storage to store the data.
  3. The data is pdf document contents, and its corresponding document embedding vectors created for lot of pdf documents with elasticsearch as the back end document store of a search application.

My issue is that the search performance is very slow and it takes more than 15 to 20 seconds, and when i have longer number of terms for search it takes 40 to 50 seconds.

Can you please suggest how to improve my search performance and make it less than 5 seconds and always same?

Note: for the same search term say "application performance limitations" I may get the search results in 15 seconds for first search, then 6 seconds when we query again due to request cache enabled. But after sometime searching the same term may take 30 seconds.

What are the strategies I can apply, should i reindex data to an index with more shards of smaller size, or reduce the segments using force merge api. what is the best way to make my search results faster.

Which version of Elasticsearch are you using?

Is your data immutable?

Are you using rollover together with ILM?

What is the retention period? How much data do you envision to have in the cluster?

How much data do you index per day? What is the expected search load?

Thank you Christian for your prompt help.

Please find the answers for your questions below:

Which version of Elasticsearch are you using?
[Mohamed Habibulla] I am using elasticsearch version 8.10.4

Is your data immutable?
[Mohamed Habibulla] Yes my data cannot be changed.

Are you using rollover together with ILM?

[Mohamed Habibulla] Yes I am using rollover along with ILM, once my index max size is 30gb, it creates a new index under the same alias.

What is the retention period? How much data do you envision to have in the cluster?

[Mohamed Habibulla] I had set a long retention period of more than 10 years. I expect there will be daily ingest or 100 to 200 pdf documents, and we envision about 0.5 million pdfs in another 5 years. This could be 1.5 tb of data(Content and embedding to be stored in elasticsearch)

How much data do you index per day? What is the expected search load?

[Mohamed Habibulla] I Expect 100 to 200 pdf documents per day. The expected search will be about 40 concurrent users, and about 2000 search requests per day. The user base is more than 20000 users.

I do not have a lot of experience optimizing vector serach, but will provide some suggestions based on what I have seen in the form so far. If I am incorrect I am sure someone will correct me, but it will give you something to work with for now.

How long does it take for an index to fill up and roll over? If this is relatively long it may make sense to lower it to maybe 10GB. This will give a higher shard count, which will allow more work to be done in parallel.

Having lots of replicas is recommended when you have a high query concurrency. Based on the number of searches you specified in your response this does not seem to apply to your use case, so I would recommend lowering it to 1 replica shard. This will result in each node holding less data, which means you may be able to make better use of your operating system page cache.

I believe vector search performance improves with a lower segment number, so I would recommend forcemerging your index down to a single segment as soon as possible after it has rolled over in your ILM policy.

As your data set already is larger than your operating system page cache, disk I/O might become a bottleneck. To reduce this make sure you are using very fast storage, ideally local SSDs. You can also increase the amount of RAM available to Elasticsearch while keeping the heap size as is, as this will increase the size of the page cache.

Thanks Christian. I’ll work on the suggestions and keep you posted.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.