Slow Query performance on small data

Hi,

My data is only 700 MB, one index, 376824 documents.
I'm querying 100 words linearly on python and it is taking over 2 seconds for all 100 words.
My data has 92 fields and 330k rows.

for word in var: 
    response = esclient.search(
    index='patientdb',
    body={
        "size": 2,
        "query": {
            "match": {
                "_all": word
            }
        }
    }
    )

var has around 100 words.
So basically I'm matching all the fields for word.
It is using only 55% of CPU and 40% of my memory. I have four cores.
Why is it taking so long?

So you are issuing around 100 queries in sequence from a single thread, meaning that the time it takes to execute each query, together with parsing and network round-trip time is around 20ms. What are you trying to achieve using this set of queries? Is there maybe some way to do that with fewer requests, e.g. by restructuring the query or simply send it all in a single request using the multi-search API?

I'm just trying to gauge the performance. There is no correlation between the searches. I have just copied 100 first names from the database and searching them.

My end game is to create as search-as-you-type using elasticsearch. Still figuring it out how to go about it.

I'm doing 100 queries at a time because once i build it, ill be getting 100 requests per second and i cannot afford 2s wait time. It is way too expensive. And right now I'm testing on just 330k rows, ill be having around 5 million rows. Hence I've this issue.

If you are serving search requests for multiple users I assume these would be sent to Elasticsearch in parallel, and not from a single thread as in your example. If you want to see how Elasticsearch performs in a certain production scenario, you need to try to simulate the load as accurately as possible. I would therefore recommend starting to issue queries from a number of threads and see how that performs. You can also use Rally for these kind of benchmarks.

Makes sense. I'll do that. Thanks for helping out.
I'd try with multiple threads and consult with you again.

So I had this issue that how many indices should I have?
How do I decide that?
Because more indices means more shards means it'll perform better when I query in parallel.
How do I form multiple indices while I have only one CSV file with 4 million rows?

Best way is to benchmark with as real data and queries as possible. Have a look at this talk about cluster sizing.

Hi,

So now I am querying ES using 100 threads, Each thread with one search query.

I am working on quad core processor, so my thread pool is 7 if i am not wrong.
So my search now is taking 5.5s and average for one thread is 4.31s.

Is there anyway I can make it better?

What happens if you use 5 threads with 20 queries each?

I tried 5 threads and 1 query each
total: 0.301993846893
avg: 0.206694364548
each: [0.06244087219238281, 0.1454310417175293, 0.24949097633361816, 0.2855041027069092, 0.290604829788208]

I'm not concerned with multiple queries.
I'm building search as you type search. I want to check how many users can do this in parallel.

Have you considered using the completion suggester?

I'm using this

PUT patient-python
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": [
            "letter","digit","symbol","whitespace"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "FirstName": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "LastName": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "PreferredName": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "DateOfBirth": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "PMRecordNumber": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "MaidenName": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "Suffix": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.