Improving Query Performance

Hello all,

I'm new in Elasticsearch and trying to find out why my query performance is so slow.

I have an index with 1M documents called "variantlist". I'm searching in "VariantIds" values with this query:

{"from":0,"query":{"terms":{"VariantIds.8":["382"]}},"size":10000,"_source":{"includes":["VariantIds"]}}

2193 documents have returned and it took 6.5 seconds! Isn't it too slow for this case? I mean 1M isn't a big number. My index has 185 MB size and system is running on single node. Index has 1 shard. JVM is running on 3072 MB memory and server has 16GB memory.

I've just included "VariantIds" field to improve performance but no luck.

Result image of my example query is here:

I can provide further information if it is needed. Any suggestions?

Could you share the mapping please? That tell us how you are indexing your data.

What happens with "size":1?

What size are your documents?

1 Like

Hi @Ignacio_Vera ,
my mapping is here.
This index is for testing purpose- we're trying to see the elasticsearch performance. So this is a bulk data, I'm creating VariantIds and VariantValues arrays randomly.

I've also a prodlist index but it isn't important for now.

Hi @dadoonet,

It took 2 milliseconds with size=1

So that's probably the reason.
You need to fetch 10000 documents per shard, send all of them to the coordinating node, then sort them and filter out the fields to send back.

You probably have big documents.

Hi @Christian_Dahlqvist,

There isn't any specific size of my documents. I'm creating it randomly.

Thanks, but I didn't understand your suggestion :slight_smile: you mean that we should divide the index by 10000 and run each in a different server?

I was just explaining all what elasticsearch has to do when you are running your query. And why it looks slow.

So, how can i improve this? Will this get faster if i delete all other fields except VariantIds field?

The first question to ask is why do you need to set size=10000. Why is that?

There will be lots of response documents and I need all VariantIds values of those documents. Actually I'll use NEST and use ScrollAll method for this. But this is necessary.

I've found that the more documents you are asking for, the more ES memory you will need (we do query with size: 1.000.000)

If you have large documents and are always interested in the same field you might want to try mapping this as a stored field.

Hi @Christian_Dahlqvist, thanks for the reply. I've used "includes" for not gathering all other fields but isn't that enough? And I saw that stored fields is not recommended in the reference. What does it do actually?

Ok, let me explain our case. I want to filter all variant types and values when a variant value selected in an E-Commerce catalog page. So I should filter variations and get all variant keys and values of those variations. Then I'll regroup them by types eg. Color, Size etc.

That's why i need all of the VariantIds.

We've this mechanism already in our Sql Server based system. But I thought that we can query this in ElasticSearch more faster than sql. But I couldn't prove it yet.

What struck me here is that in the mapping the variantIds are mapped to long. This datatype is optimise for range searches, but you are using a terms query. I think you should try to map those fields as keyword if you don't need range searches.

2 Likes

Hi again,

I've tested your suggestion, I have created another index which have VariantIds field with keyword type. I've also removed VariantValues field for a smaller document size. Took measurement of Elasticsearch has been decreased dramatically (it's about 200-300ms). But response time was still about 4 seconds because of the downloaded data size. So I've also changed my query for removing the repeating records:

{
"from": 0,
"size": 10000,
"query":{
	"nested": {
		"path": "VariantIds",
		"query": {
			"bool" : {
				"must" : [
				{"term":{"VariantKey":"8"}},
				{"term":{"VariantValue":"382"}}
				]
			}
		}
	}
},
"_source":{
	"excludes":["*"]
}
,"aggs": {
    "topics_agg": {
      "nested": {
        "path": "VariantIds"
      },
      "aggs": {
        "name": {
          "terms": {
            "field": "VariantIds.VariantKey"
          },
          "aggs": {
            "name": {
              "terms": {
                "field": "VariantIds.VariantValue"
              }
            }
          }
        }
      }
    }
  }
}

And the response is here.

Do you think that this solution is reasonable?

I am glad to hear that this approach improve the query performance.

My understanding is that you are only interested in the variantIds that match the query. Therefore I think you can set the size of the request to 0. Aggregations will run always in all documents that match the query regardless of the values in size.

1 Like