Elasticsearch Query optimization

shikjind · March 9, 2020, 7:00am

We have the 780 GB of data at the elastic index, when triggering the below query :-
{ "query": { "function_score": { "query": { "bool": { "must": [ { "query_string": { "query": "(coreStringCaseIns:\"Auto enhance photos with just a tap.\" OR locStringCaseIns:\"Auto enhance photos with just a tap.\")", "fields": [], "type": "best_fields", "default_operator": "or", "max_determinized_states": 10000, "enable_position_increments": true, "fuzziness": "AUTO", "fuzzy_prefix_length": 0, "fuzzy_max_expansions": 50, "phrase_slop": 0, "escape": false, "auto_generate_synonyms_phrase_query": true, "fuzzy_transpositions": true, "boost": 1 } } ], "filter": [ { "bool": { "should": [ { "term": { "field1": { "value": "45001", "boost": 1 } } }, { "term": { "field2": { "value": "45002", "boost": 1 } } }, { "term": { "field": { "value": "45003", "boost": 1 } } } ], "adjust_pure_negative": true, "boost": 1 } }, { "bool": { "adjust_pure_negative": true, "boost": 1 } }, { "bool": { "adjust_pure_negative": true, "boost": 1 } }, { "bool": { "adjust_pure_negative": true, "boost": 1 } }, { "bool": { "adjust_pure_negative": true, "boost": 1 } } ], "adjust_pure_negative": true, "boost": 1 } }, "functions": [ { "filter": { "bool": { "must": [ { "query_string": { "query": "(id:\"10426\")", "fields": [], "type": "best_fields", "default_operator": "or", "max_determinized_states": 10000, "enable_position_increments": true, "fuzziness": "AUTO", "fuzzy_prefix_length": 0, "fuzzy_max_expansions": 50, "phrase_slop": 0, "escape": false, "auto_generate_synonyms_phrase_query": true, "fuzzy_transpositions": true, "boost": 1 } } ], "adjust_pure_negative": true, "boost": 1 } }, "weight": 297630966000000 }, { "filter": { "bool": { "must": [ { "query_string": { "query": "(id:\"10110\")", "fields": [], "type": "best_fields", "default_operator": "or", "max_determinized_states": 10000, "enable_position_increments": true, "fuzziness": "AUTO", "fuzzy_prefix_length": 0, "fuzzy_max_expansions": 50, "phrase_slop": 0, "escape": false, "auto_generate_synonyms_phrase_query": true, "fuzzy_transpositions": true, "boost": 1 } } ], "adjust_pure_negative": true, "boost": 1 } }, "weight": 33242801500000 }, { "filter": { "bool": { "must": [ { "query_string": { "query": "(id:\"522\")", "fields": [], "type": "best_fields", "default_operator": "or", "max_determinized_states": 10000, "enable_position_increments": true, "fuzziness": "AUTO", "fuzzy_prefix_length": 0, "fuzzy_max_expansions": 50, "phrase_slop": 0, "escape": false, "auto_generate_synonyms_phrase_query": true, "fuzzy_transpositions": true, "boost": 1 } } ], "adjust_pure_negative": true, "boost": 1 } }, "weight": 1385116730000 }, { "filter": { "bool": { "must": [ { "query_string": { "query": "(locale:\"ja_JP\")", "fields": [], "type": "best_fields", "default_operator": "or", "max_determinized_states": 10000, "enable_position_increments": true, "fuzziness": "AUTO", "fuzzy_prefix_length": 0, "fuzzy_max_expansions": 50, "phrase_slop": 0, "escape": false, "auto_generate_synonyms_phrase_query": true, "fuzzy_transpositions": true, "boost": 1 } } ], "adjust_pure_negative": true, "boost": 1 } }, "weight": 9999999800000 }, { "filter": { "bool": { "must": [ { "range": { "modify": { "from": "now-30d", "to": null, "include_lower": true, "include_upper": true, "boost": 1 } } } ], "adjust_pure_negative": true, "boost": 1 } }, "weight": 9999999800000 } ], "score_mode": "multiply", "max_boost": 3.4028235e+38, "boost": 1 } } }

ElasticSearch is taking 15.867s to return the results. We have already tried most of the optimizations from our side, I am posting this question to find out if there are still any optimizations possible.

theDor · March 9, 2020, 9:55am

How many shards is your index?

shikjind · March 9, 2020, 12:51pm

Hi theDor,

There are 5 primary shards and 1 replica shards.

shikjind · March 11, 2020, 12:51pm

Sharing the System configurations :-

Total size - 1.55 TB
Used - 793 GB

There are 4 nodes having 5 primary shards and each primary shard have 1 replica shard.

Hope this info helps, right now performance is very bad, as mentioned for the above query elastic search takes 15 sec to fetch results.

I am eagerly looking for any scope of optimization in the query or any other work around to make performance better.

Please let me know if you need any more info.

Thanks!

theDor · March 13, 2020, 10:30pm

I would suggest number of solutions:

Increase your cluster system (More memory)
Incrase the number of shards the index have (The best shard size is between 20GB to 40GB, with more shards your query will be more distrbuted across your Elasticsearch cluster)
split the index data to several indices (by time series etc.) and query only the necessary data, then you will not query 780GB of data you will query less and get better results (ofc it depends on your needs)
Try using search profiler on Dev Tools on kibana and then you can see every time if you get better results

Hope it will help you

shikjind · March 13, 2020, 11:20pm

Hi theDor,

Thankyou for the response.

I would like to clarify few things.

I have the cluster size of 1.55 GB with 4 data nodes so by increasing the cluster size I guess you mean to increase the nodes also can you please suggest what is the optimum cluster size and the number of nodes for 750 GB of data.
If I don't change the number of nodes and cluster size, will increasing the number of primary shards such that the data is in the range of 20 GB to 40 GB per shard will reduce the search time significantly?

theDor · March 16, 2020, 4:15pm

i would suggest to increase the index number of shards to between 20 and 35 shards (depends how many data nodes you can add).
from my experience, increasing the number of shards and make them smaller, make the query time runs faster, because every shards is an inverted index and the query becomes more distrubted across the cluster.

itizir · March 16, 2020, 4:37pm

Just curious... have you tried the query without that range filter?

shikjind · March 17, 2020, 6:03am

Hi itizir,

I tried running the query after removing the range filter but still there is not any improvement.

itizir · March 17, 2020, 11:01am

Hey again.

Hm, I see. I was just suggesting because we've seen poor performance of range queries, and hadn't looked at the query closely.

I'm not particularly familiar with function_score and full text search, but can still try to help...

Why all the empty bool clauses in the filter? (probably not affecting things though)
Except for the time range, all queries in the functions seem like they should be simple term queries: matching project IDs, etc. (or am I misunderstanding). So not sure why relegating them to functions.
The first main query (looking for "Auto enhance photos with just a tap."): is that looking for an exact match? What's the mapping in these coreStringCaseIns and locStringCaseIns fields? If this is the expensive query, perhaps it should be moved?

Have you tried profiling the query, to get a sense of what takes time?
Does the query cache well, as in does it get faster quickly as you repeat the search?

shikjind · March 19, 2020, 7:39am

Just a little typo :-

It's not 1.55 GB but 1.55 TB

shikjind · March 20, 2020, 5:14am

Hi theDor,

Thankyou so much that is quite crisp and clear!
Can you also suggest ideally how many shards a node should contain?

theDor · March 31, 2020, 7:07pm

I would suggest you to read this blog about shards:

And for your question, the suggested amount of shards is 20 per 1GB Heap

system · April 28, 2020, 7:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to optimize my ElasticSearch Query Elasticsearch	22	156	April 5, 2025
Performance of using Elasticsearch to search for people Elasticsearch	16	1592	September 12, 2022
How can I tune for Elasticsearch performance? Elasticsearch	9	603	May 18, 2020
Further optimization to ES queries / performance Elasticsearch	1	359	September 3, 2020
Elasticsearch performs slowly when data size increased Elasticsearch	3	920	March 21, 2017

Elasticsearch Query optimization

Related topics