Very slow query


(Sasankvemuri) #1

I have been trying to search and retrieve results based on 3 user input values and find the closest value that satisfies these three conditions. The doc has about 30 million rows and the query takes about 10 seconds to execute.

def sgrna(factor, efficiency, specificity, chr):
    es = Elasticsearch()
    minimum = min(factor)
    request = []
    min(factor)
    for i in range(len(chr)):
        req_head = {'index': 'lab', 'type': 'sg_rna'}
        req_body = {
                      "query": {
                        "bool": {
                          "must": [
                            {
                              "term": {
                                "BS_CHR": chr[i]
                              }
                            },
                            {
                              "range": {
                                "Efficiency": {
                                  "gte": int(efficiency)
                                }
                              }
                            },
                            {
                              "range": {
                                "Specificity": {
                                  "gte": int(specificity)
                                }
                              }
                            }      ]
                        }
                      }
                    ,
                            "sort" : {
                                "_script" : {
                                    "type" : "number",
                                    "script" : {
                                        "lang": "painless",
                                        "params": {
                                      "factor": int(factor[i])
                                    },
                                        "inline": "def cur = 0; cur = (params.factor - doc['BS_START'].value); if (cur < 0) { cur = cur * -1 } else { cur = cur}" },
                                    "order" : "asc"
                                }
                            }

                        }
        request.extend([req_head, req_body])
    resp = es.msearch(body=request)
    print(resp)
    response = []
    for i in range(len(resp["responses"])):
        response.append(resp['responses'][i]['hits']["hits"][0]["_source"])
    return response

(David Pilato) #2

The code formatting is bugged here so external code link

WDYM?

Use </> icon as explained in this guide and not the citation button.

Or use markdown style like:

```
CODE
```

(Sasankvemuri) #3

I meant when I was on my laptop when I posted it. Not the website system (my bad I didn't explain) however the problem is the query runtime. I'm wondering if there's any better way to speed up the search?


(David Pilato) #4

I edited your post.

What is the took that elasticsearch is giving back?
What is the took when you remove the painless script?

BTW I'd move to a filter instead of must all your term and range queries.


(Sasankvemuri) #5

I assume you mean time and with the painless script it takes about 10 seconds to go through 100 iterations and without painless script there is no option that I could find in ES that will find the nearest number in an index to a given input.
I initially used filter, however, the filter was not being specific in retrieving range values and I didnt get desired result
for example when I gave "Specificity" should be greater than 50 it was not reflected in the output because the sort script took over the preference and hence I had to add everything to must.


(David Pilato) #6

I meant took field in elasticsearch response.

About using must/filter that should not change in the way you describe I think.

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.


(Sasankvemuri) #7

This is a simplified version of the query. I basically want one field to match and then I want the matched results to return based on the nearest value from user input. In this case, user input is the "factor" which I have assigned under params.
The index has about 300 million rows and it 'took': 2364 to complete one search.

{
    "query": {
        "term": {"CHR": "chr1"}
    },
    "sort" : {
        "_script" : {
            "type" : "number",
            "script" : {
                "lang": "painless",
                "params": {
              "factor": 100000
            },
                "inline": "def cur = 0; cur = (params.factor - doc['POS'].value); if (cur < 0) { cur = cur * -1 } else { cur = cur}" },
            "order" : "asc"
        }
    }

}

(David Pilato) #8

I was asking for a reproduction for the problem you asked before about the bool must vs bool filter part.

Also can you please give me what the took value is with and without the sort part?


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.