Elasticsearch query is too slow

Hi ,

I am using a python script to pull the data from elasticsearch and process it. It has around 5000k records , but it's more then 24 hrs and it's still running. What i can do to make it fast.

Here is the query I am using in python

def job_query_acct2(starttime, endtime,cluster,queue):
    starttime = str(starttime)
    endtime = str(endtime)
    tzone = "America/Los_Angeles"
    begq = '{ "_source": {"includes":["max_mem","max_no_procs"]},"query": {"bool": {"must": ['
    boolq2 = '{"bool": {"must": [{"match": {"cluster.keyword": ' + '"' + cluster + '"}},{ "match_phrase": { "queue": { "query":' + '"' + queue +'" } } } ,{ "range": { "event_time": {"lte":' + '"'+endtime + '", "gte":' + '"' + starttime + '", "time_zone": ' + '"'+tzone + '","format": "strict_date_optional_time" } } }]}}'
    endq = ']}} }'
    jobqry = begq + boolq2 +  endq
    return jobqry


res = es.search(index=idx, scroll='2m', body=jobqry,timeout='1200s')

I am trying to get the data of one week (4-11 May)

Since you're just trying to pull out data with scrolls, I would move those clauses into the filter portion of the boolean query. This will remove the scoring aspect and should help speed it up. Right now the query is scoring all the boolean components which is not necessary because you don't actually care about the score.

Otherwise, larger scroll sizes can help (by reducing the number of round trips), and sorting the scroll by _doc will allow it to run faster (see the end of here: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-request-body.html#request-body-search-scroll)

I would also check and make sure there aren't any exceptions in the server log or client log :slight_smile: I've definitely run into slow batches before and realized afterwards that I had an error in my code and it was spewing exceptions for hours, not making any progress. This can happen if you're not using the scroll ID correctly for example

I changed the below line and it's running very fast now. I am not sure if there any issue while using this any idea ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.