How to increase efficiency of search queries in Elasticsearch

dadoonet · June 12, 2019, 4:20pm

70mb in total?

What gives

GET /_cat/indices?v

avinash9999 · June 13, 2019, 6:48am

This is how it looks right now and I am fetching documents from second index

Screenshot%20(2)

dadoonet · June 13, 2019, 7:22am

Please don't post images of text as they are hardly readable and not searchable.

Instead paste the text and format it with </> icon. Check the preview window.

Can you share the code you are using to extract all the data?

avinash9999 · June 13, 2019, 9:22am

I am using it in python

first I made my connection

es = ES(host=host, port=port, timeout=100)

then I have search first 10000 documents

scroll_data = es.search(
  index = index,
  scroll = '1m',
  size = 10000,
body={
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "language" : "3"
          }
        },
        {
          "match": {
            "statusid" : "5"
          }
        },
        {
          "range": {
            "cp": {
              "gte": "1"
            }
          }
        },
            { "wildcard": { "list" : "*17*" }}
      ]
    }
  }
})

**then I have use scroll for the rest **

sid = scroll_data['_scroll_id']
scroll_size = scroll_data['hits']['total']['value'] - size
final_scroll_data = []
final_scroll_data.extend(scroll_data['hits']['hits'])

while (scroll_size > 0):
    each_scroll = es.scroll(scroll_id = sid, scroll = '1m')
    sid = each_scroll['_scroll_id']
    scroll_size = scroll_size - len(each_scroll['hits']['hits'])
    final_scroll_data.extend(each_scroll['hits']['hits'])

Christian_Dahlqvist · June 13, 2019, 9:37am

You are using a wildcard query with wildcards at both ends, which is the most inefficient query you can run in Elasticsearch. What does CPU usage look like while the query is running? What is the specification of the host?

dadoonet · June 13, 2019, 9:41am

Yeah. This is weird:

{ "wildcard": { "list" : "*17*" }}

Look at the documentation: Wildcard query | Elasticsearch Guide [8.11] | Elastic

Avoid beginning patterns with * or ? . This can increase the iterations needed to find matching terms and slow search performance.

What gives:

GET your_index/_search
{
  "size": 1,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "language" : "3"
          }
        },
        {
          "match": {
            "statusid" : "5"
          }
        },
        {
          "range": {
            "cp": {
              "gte": "1"
            }
          }
        },
            { "wildcard": { "list" : "*17*" }}
      ]
    }
  }
}

avinash9999 · June 13, 2019, 9:49am

@dadoonet, @Christian_Dahlqvist Thanks for the suggestion.

But with or without wildcard it is taking the time. It totally depends on data.

Right now I Have made two nodes on two different SSD machine and there is no performance improvement

dadoonet · June 13, 2019, 9:50am

Can you share the output of the query I asked to run?

Christian_Dahlqvist · June 13, 2019, 10:51am

Given the small data size I would expect the data to be cached no matter what type of disk you have. What is CPU utilisation looking like? Do you have swap enabled? Are the other processes running on the host that could be interfering?

avinash9999 · June 13, 2019, 11:58am

output of the query --

{
"took" : 40,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 4.0,
"hits" : [
{
"_index" : "sc-surveys2",
"_type" : "surveys",
"_id" : "140277170216",
"_score" : 4.0,
"_source" : {
"@version" : "1",
"qualificationanswerid" : 216,
"description" : "Bio-Tech",
"surveystatusid" : 5,
"epc" : 0,
"qualificationid" : 70,
"qualificationanswerdesc" : "Bio-Tech",
"id" : "140277170216",
"supplierlist" : "17,603,554,28,27,623,581,307,101,126,30",
"surveyid" : 1402771,
"cpi" : 2.75,
"languageid" : 3,
"@timestamp" : "2019-06-13T09:35:59.449Z",
"qualificationname" : "STANDARD_INDUSTRY_PERSONAL"
}
}
]
}
}

dadoonet · June 13, 2019, 7:57pm

What are now the first lines until hits if you change

 "size": 10000

avinash9999 · June 14, 2019, 7:34am

when I run for-

"size": 10000

it gives result like-

{
"took" : 344,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 4.0,
"hits" : [
{
"_index" : "sc-surveys2",
"_type" : "surveys",
"_id" : "13943646059",

Pradyumna_Achar · June 14, 2019, 4:25pm

Just curious - the "took" represents the time in milliseconds right? So it is 344 milliseconds to retrieve 10,000 documents as opposed to the 9 seconds mentioned, or am I missing something?

dadoonet · June 14, 2019, 5:49pm

Exact. That's what I wanted to see.
So the rest of time is most likely spent on the network I'd say.

avinash9999 · June 16, 2019, 7:51pm

okay thanks so is there anyway I can overcome this situation??

dadoonet · June 16, 2019, 8:13pm

What if you run the extraction on the same machine elasticsearch is running?

avinash9999 · June 17, 2019, 12:00pm

when I run the extraction on same machine it is quick as compared when i run on other machine. But in my scenario I want to run the extraction from different machine.

dadoonet · June 17, 2019, 12:45pm

So we are now pretty sure that you have a network problem.
I don't think that we can fix anything on elasticsearch side but at least you should investigate what kind of network connection you have between your client and elasticsearch server...

system · July 15, 2019, 12:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to increase query speed on search engine? Elasticsearch	4	315	February 23, 2022
Tips to increase query performance Elastic Search elastic-app-search	3	686	November 4, 2019
Search Performance Elasticsearch	9	345	July 6, 2017
Performance impact of returning large result sets Elasticsearch	3	4290	July 5, 2017
Slow query for large size values Elasticsearch	6	1491	July 31, 2019

How to increase efficiency of search queries in Elasticsearch

Related topics