Improving Query Performance

emrealtan · July 16, 2019, 1:04pm

Hello all,

I'm new in Elasticsearch and trying to find out why my query performance is so slow.

I have an index with 1M documents called "variantlist". I'm searching in "VariantIds" values with this query:

{"from":0,"query":{"terms":{"VariantIds.8":["382"]}},"size":10000,"_source":{"includes":["VariantIds"]}}

2193 documents have returned and it took 6.5 seconds! Isn't it too slow for this case? I mean 1M isn't a big number. My index has 185 MB size and system is running on single node. Index has 1 shard. JVM is running on 3072 MB memory and server has 16GB memory.

I've just included "VariantIds" field to improve performance but no luck.

Result image of my example query is here:

I can provide further information if it is needed. Any suggestions?

Ignacio_Vera · July 16, 2019, 1:31pm

Could you share the mapping please? That tell us how you are indexing your data.

dadoonet · July 16, 2019, 1:51pm

What happens with "size":1?

Christian_Dahlqvist · July 16, 2019, 1:51pm

What size are your documents?

emrealtan · July 16, 2019, 1:51pm

Hi @Ignacio_Vera ,
my mapping is here.
This index is for testing purpose- we're trying to see the elasticsearch performance. So this is a bulk data, I'm creating VariantIds and VariantValues arrays randomly.

I've also a prodlist index but it isn't important for now.

emrealtan · July 16, 2019, 1:53pm

Hi @dadoonet,

It took 2 milliseconds with size=1

dadoonet · July 16, 2019, 1:54pm

So that's probably the reason.
You need to fetch 10000 documents per shard, send all of them to the coordinating node, then sort them and filter out the fields to send back.

You probably have big documents.

emrealtan · July 16, 2019, 1:57pm

Hi @Christian_Dahlqvist,

There isn't any specific size of my documents. I'm creating it randomly.

emrealtan · July 16, 2019, 1:58pm

Thanks, but I didn't understand your suggestion you mean that we should divide the index by 10000 and run each in a different server?

dadoonet · July 16, 2019, 1:59pm

I was just explaining all what elasticsearch has to do when you are running your query. And why it looks slow.

emrealtan · July 16, 2019, 2:03pm

So, how can i improve this? Will this get faster if i delete all other fields except VariantIds field?

dadoonet · July 16, 2019, 2:05pm

The first question to ask is why do you need to set size=10000. Why is that?

emrealtan · July 16, 2019, 2:29pm

There will be lots of response documents and I need all VariantIds values of those documents. Actually I'll use NEST and use ScrollAll method for this. But this is necessary.

cthibodeau · July 16, 2019, 4:14pm

I've found that the more documents you are asking for, the more ES memory you will need (we do query with size: 1.000.000)

Christian_Dahlqvist · July 16, 2019, 4:53pm

If you have large documents and are always interested in the same field you might want to try mapping this as a stored field.

emrealtan · July 17, 2019, 7:06am

Hi @Christian_Dahlqvist, thanks for the reply. I've used "includes" for not gathering all other fields but isn't that enough? And I saw that stored fields is not recommended in the reference. What does it do actually?

emrealtan · July 17, 2019, 7:19am

Ok, let me explain our case. I want to filter all variant types and values when a variant value selected in an E-Commerce catalog page. So I should filter variations and get all variant keys and values of those variations. Then I'll regroup them by types eg. Color, Size etc.

That's why i need all of the VariantIds.

We've this mechanism already in our Sql Server based system. But I thought that we can query this in ElasticSearch more faster than sql. But I couldn't prove it yet.

Ignacio_Vera · July 17, 2019, 10:40am

What struck me here is that in the mapping the variantIds are mapped to long. This datatype is optimise for range searches, but you are using a terms query. I think you should try to map those fields as keyword if you don't need range searches.

emrealtan · July 22, 2019, 8:12am

Hi again,

I've tested your suggestion, I have created another index which have VariantIds field with keyword type. I've also removed VariantValues field for a smaller document size. Took measurement of Elasticsearch has been decreased dramatically (it's about 200-300ms). But response time was still about 4 seconds because of the downloaded data size. So I've also changed my query for removing the repeating records:

{
"from": 0,
"size": 10000,
"query":{
	"nested": {
		"path": "VariantIds",
		"query": {
			"bool" : {
				"must" : [
				{"term":{"VariantKey":"8"}},
				{"term":{"VariantValue":"382"}}
				]
			}
		}
	}
},
"_source":{
	"excludes":["*"]
}
,"aggs": {
    "topics_agg": {
      "nested": {
        "path": "VariantIds"
      },
      "aggs": {
        "name": {
          "terms": {
            "field": "VariantIds.VariantKey"
          },
          "aggs": {
            "name": {
              "terms": {
                "field": "VariantIds.VariantValue"
              }
            }
          }
        }
      }
    }
  }
}

And the response is here.

Do you think that this solution is reasonable?

Ignacio_Vera · July 22, 2019, 10:57am

I am glad to hear that this approach improve the query performance.

My understanding is that you are only interested in the variantIds that match the query. Therefore I think you can set the size of the request to 0. Aggregations will run always in all documents that match the query regardless of the values in size.

Topic		Replies	Views
General tips on performance tuning Elasticsearch	5	389	July 6, 2017
How to achieve Query Performance Elasticsearch	11	735	July 6, 2017
4-5 second query time. Only 50 documents. Need help Elasticsearch	11	460	July 6, 2017
Slow query performance with a large size value Elasticsearch	6	1559	July 6, 2017
Elasticsearch 5 - Large (size) query performance with complex mapping Elasticsearch	2	1086	July 18, 2017

Improving Query Performance

Related topics