Slow Query performance on small data

munotshubham · June 2, 2017, 7:15pm

Hi,

My data is only 700 MB, one index, 376824 documents.
I'm querying 100 words linearly on python and it is taking over 2 seconds for all 100 words.
My data has 92 fields and 330k rows.

for word in var: 
    response = esclient.search(
    index='patientdb',
    body={
        "size": 2,
        "query": {
            "match": {
                "_all": word
            }
        }
    }
    )

var has around 100 words.
So basically I'm matching all the fields for word.
It is using only 55% of CPU and 40% of my memory. I have four cores.
Why is it taking so long?

Christian_Dahlqvist · June 3, 2017, 10:53am

So you are issuing around 100 queries in sequence from a single thread, meaning that the time it takes to execute each query, together with parsing and network round-trip time is around 20ms. What are you trying to achieve using this set of queries? Is there maybe some way to do that with fewer requests, e.g. by restructuring the query or simply send it all in a single request using the multi-search API?

munotshubham · June 5, 2017, 1:30pm

I'm just trying to gauge the performance. There is no correlation between the searches. I have just copied 100 first names from the database and searching them.

My end game is to create as search-as-you-type using elasticsearch. Still figuring it out how to go about it.

munotshubham · June 5, 2017, 1:36pm

I'm doing 100 queries at a time because once i build it, ill be getting 100 requests per second and i cannot afford 2s wait time. It is way too expensive. And right now I'm testing on just 330k rows, ill be having around 5 million rows. Hence I've this issue.

Christian_Dahlqvist · June 5, 2017, 2:04pm

If you are serving search requests for multiple users I assume these would be sent to Elasticsearch in parallel, and not from a single thread as in your example. If you want to see how Elasticsearch performs in a certain production scenario, you need to try to simulate the load as accurately as possible. I would therefore recommend starting to issue queries from a number of threads and see how that performs. You can also use Rally for these kind of benchmarks.

munotshubham · June 5, 2017, 2:45pm

Makes sense. I'll do that. Thanks for helping out.
I'd try with multiple threads and consult with you again.

munotshubham · June 5, 2017, 3:49pm

So I had this issue that how many indices should I have?
How do I decide that?
Because more indices means more shards means it'll perform better when I query in parallel.
How do I form multiple indices while I have only one CSV file with 4 million rows?

Christian_Dahlqvist · June 5, 2017, 3:53pm

Best way is to benchmark with as real data and queries as possible. Have a look at this talk about cluster sizing.

munotshubham · June 8, 2017, 7:11pm

Hi,

So now I am querying ES using 100 threads, Each thread with one search query.

I am working on quad core processor, so my thread pool is 7 if i am not wrong.
So my search now is taking 5.5s and average for one thread is 4.31s.

Is there anyway I can make it better?

Christian_Dahlqvist · June 8, 2017, 7:17pm

What happens if you use 5 threads with 20 queries each?

munotshubham · June 8, 2017, 7:22pm

I tried 5 threads and 1 query each
total: 0.301993846893
avg: 0.206694364548
each: [0.06244087219238281, 0.1454310417175293, 0.24949097633361816, 0.2855041027069092, 0.290604829788208]

I'm not concerned with multiple queries.
I'm building search as you type search. I want to check how many users can do this in parallel.

Christian_Dahlqvist · June 8, 2017, 7:25pm

Have you considered using the completion suggester?

munotshubham · June 8, 2017, 7:28pm

I'm using this

PUT patient-python
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": [
            "letter","digit","symbol","whitespace"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "FirstName": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "LastName": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "PreferredName": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "DateOfBirth": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "PMRecordNumber": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "MaidenName": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "Suffix": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

system · July 6, 2017, 7:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Long query starving other small queries and slowing down the cluster Elasticsearch	3	454	February 13, 2019
Pyes related question : performance related Elasticsearch	13	537	July 6, 2017
Performance of using Elasticsearch to search for people Elasticsearch	16	1598	September 12, 2022
Elastic queries slows down when there multiple requests Elasticsearch	1	442	July 5, 2017
Search slow when high concurrency? Elasticsearch	2	1057	July 5, 2017

Slow Query performance on small data

Related topics