Elasticsearch query is too slow

niraj_pandey · May 17, 2020, 4:14pm

Hi ,

I am using a python script to pull the data from elasticsearch and process it. It has around 5000k records , but it's more then 24 hrs and it's still running. What i can do to make it fast.

Here is the query I am using in python

def job_query_acct2(starttime, endtime,cluster,queue):
    starttime = str(starttime)
    endtime = str(endtime)
    tzone = "America/Los_Angeles"
    begq = '{ "_source": {"includes":["max_mem","max_no_procs"]},"query": {"bool": {"must": ['
    boolq2 = '{"bool": {"must": [{"match": {"cluster.keyword": ' + '"' + cluster + '"}},{ "match_phrase": { "queue": { "query":' + '"' + queue +'" } } } ,{ "range": { "event_time": {"lte":' + '"'+endtime + '", "gte":' + '"' + starttime + '", "time_zone": ' + '"'+tzone + '","format": "strict_date_optional_time" } } }]}}'
    endq = ']}} }'
    jobqry = begq + boolq2 +  endq
    return jobqry


res = es.search(index=idx, scroll='2m', body=jobqry,timeout='1200s')

niraj_pandey · May 17, 2020, 4:15pm

I am trying to get the data of one week (4-11 May)

polyfractal · May 18, 2020, 5:21pm

Since you're just trying to pull out data with scrolls, I would move those clauses into the filter portion of the boolean query. This will remove the scoring aspect and should help speed it up. Right now the query is scoring all the boolean components which is not necessary because you don't actually care about the score.

Otherwise, larger scroll sizes can help (by reducing the number of round trips), and sorting the scroll by _doc will allow it to run faster (see the end of here: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-request-body.html#request-body-search-scroll)

polyfractal · May 18, 2020, 5:22pm

I would also check and make sure there aren't any exceptions in the server log or client log I've definitely run into slow batches before and realized afterwards that I had an error in my code and it was spewing exceptions for hours, not making any progress. This can happen if you're not using the scroll ID correctly for example

niraj_pandey · May 19, 2020, 6:10am

I changed the below line and it's running very fast now. I am not sure if there any issue while using this any idea ?

system · June 16, 2020, 6:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance issues in large bool filter query Elasticsearch	2	432	November 20, 2018
Query taking longer times than expected, possible ways of optimization at query level Elasticsearch	3	220	June 29, 2023
Need advise on increasing the elastic search performance Elasticsearch	1	451	December 26, 2016
Simple Query (1 cardinality, 19,000 docs) takes ~100k ms to complete Elasticsearch	4	314	July 21, 2021
Very Slow query on ElasticSearch Elasticsearch	1	452	September 4, 2018

Elasticsearch query is too slow

Related topics