How do I speed up this query?

rgelb · May 11, 2019, 5:23am

I am trying to speed up a query and not sure if I am going about it the right way (being a newcomer to ES).

I am trying to find an address in an index (that contains 200 million records) by it's constituent parts. And I am looking for specific values. So for instance, I have street number (1118), street name (Tower), street type (Rd) and zip code (90210). And I am trying to find the property with that address. So my query:

POST properties/_search
{
  "from": 0,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "streetNumber": {
              "query": "1118"
            }
          }
        },
        {
          "match": {
            "streetName": {
              "query": "Tower"
            }
          }
        },
        {
          "match": {
            "zip": {
              "query": "90210"
            }
          }
        },
        {
          "match": {
            "streetSuffix": {
              "query": "Rd"
            }
          }
        }
      ]
    }
  },
  "size": 10
}

It finds the record fine, but it takes roughly half a second.

Schema for fields I am searching is defined as follows:

"streetName": {
"type": "string",
"analyzer": "citysearch_analyzer"
},
"streetNumber": {
"type": "string",
"analyzer": "keyword_analyzer"
},
"streetSuffix": {
"type": "string",
"analyzer": "keyword_analyzer"
},
"zip": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
}

Can this query be improved?

dadoonet · May 11, 2019, 5:48am

What is the output of:

GET /_cat/health?v
GET /_cat/indices?v
GET /_cat/shards?v

Did you try to profile the query?
Could you share the response?
Does it still "slow" after some runs?

rgelb · May 11, 2019, 6:50am

The cluster is healthy. The indices are green. And the shards started.

Your question reminded me that I omitted an important piece of information. I can't profile it because we are on an ancient version of Elastic (1.7.6), which doesn't seem to have a profiler from what I can tell. So the only hope here is that I screwed up the query somehow

Posted here.

Running the same query is inconsistent. Majority of responses will be around 400-800 ms. Then some will be like 6 to 20 ms. There is probably caching at play here that rapidly expires. But it doesn't matter - I don't run the same query over and over. The address lookups are different every time.

dadoonet · May 11, 2019, 9:23am

That was not my question. I was asking for the output of those commands.

A lot, like A LOT, of improvements have been done over the past years. You should really upgrade to 7.0. Specifically if you don't need the exact total number of hits you can benefit from many improvements. Also a newer JVM can help.

It does matter a lot even if you are searching for other terms.

BTW what kind of hardware do you have? Ssd drives?

rgelb · May 11, 2019, 4:59pm

GET /_cat/health?v
GET /_cat/indices?v
GET /_cat/shards?v

Upgrade to 6.x is ongoing, but very slowly. Meanwhile I got to deal with what I have.

I am not sure. I'll have to follow up on Monday.

Here is the index definition.

dadoonet · May 13, 2019, 10:34pm

I don't see an index named properties. Is it an alias?

rgelb · May 13, 2019, 11:25pm

Yes. Aliased to properties_web04_20180118.

dadoonet · May 14, 2019, 7:56am

Did you get the answer?

rgelb · May 14, 2019, 10:55pm

3 boxes. Each one has 252GB of RAM. And each one has 2 CPUs. Which are 8 cores each and then each core is hyperthreaded giving it poor man's 32 cores per box.

Each box has 252GB of RAM, but the usage is around 30-40GB. The hard drives are all SSD.

dadoonet · May 15, 2019, 5:12am

As you have plenty of resources on your machines, would it be possible to start a 7.0.1 cluster on the same hardware, use reindex from remote to read from the 1.7 cluster?

Also maybe running hot threads API while a "long" query is running could tell something?

system · June 12, 2019, 5:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Need help with query performance issue Elasticsearch	1	533	July 5, 2017
Why is my query slow? Elasticsearch	9	7294	July 5, 2017
What else can I do to improve performance? Elasticsearch	2	438	July 5, 2017
Simple Query (1 cardinality, 19,000 docs) takes ~100k ms to complete Elasticsearch	4	312	July 21, 2021
Need advise on increasing the elastic search performance Elasticsearch	1	451	December 26, 2016

How do I speed up this query?

Related topics