Performance on query with nested objects

Hi,
I have a performance problem related to documents that have many documents on nested relation.
We've recently realized that we have some slow queries that depends on the data itself and not the query.
Ex: I query for all items by a simple relation field:

GET index/_search
{"from":0,"size":10,
"query":{
"bool":{
"filter":[
{"term":{
"retailer.id":{"value":123,"boost":1.0}
}
}],
"adjust_pure_negative":true,"boost":1.0
}
},"_source":{"includes":["id"],"excludes":["nested_relations"]}
}

and it takes 5 ms, but I do the same with a doc that has 6200 nested objects and it takes 240ms.
I tried excluding fields, returning only id, it's more or less the same.
In the query is nothing related to that nested object relation, no filtering no ordering, just the documents existing. If I remove the nested objects, then it goes well for the same doc.

I've tried also denormalizing and duplicating content, as create every root doc with every object on the nested field, so the data of the root doc is repeated n times per object relation and then I do aggs. This way performance is a lot greater but it complicates the query and pagination can't be done.

So the question is, there is a limitation of size for performance related to nested objects?
Why is affected on queries that do not use this relation at all?

The mapping index:

PUT test_index
{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
},
"mappings": {
"doc": {
"dynamic": "strict",
"properties": {
"id": {"type": "long"},
"name": {"type": "text"},
"retailer": {
"type": "object",
"properties": {
"id": {"type": "long"}
}
},
"stores": {
"type": "nested",
"properties": {
"id": {"type": "long"},
"normalized_name": {"type": "keyword"},
"area": {"type": "geo_shape", "tree": "quadtree", "precision": "2km"},
"location": {"type": "geo_point"},
"retailer_id": {"type": "long"},
"address": {
"type": "object",
"properties": {
"location": {"type": "keyword"},
"locality": {"type": "keyword"},
"post_code": {"type":"keyword"},
"municipality_normalized_name": {"type":"keyword"}
}
},
"time_zone_difference": {"type": "integer"},
"opening_times": {
"type": "nested",
"properties": {
"day_of_week": {"type":"integer"},
"ini_hour": {"type":"integer"},
"ini_minute": {"type":"integer"},
"end_hour": {"type":"integer"},
"end_minute": {"type":"integer"}
}
}
}
}

  }
}

}
}

I'm using ES version 6.5.

Thank you

Unless you totally exclude the _source, elasticsearch still has to load for each of the 10 documents per shard the full _source and then parse it to remove the excluded keys.
I guess that's where the difference is coming from.

You can alternatively store explicitly in the mapping the few fields you want to retrieve and use stored_fields to only load them but not the _source. See https://www.elastic.co/guide/en/elasticsearch/reference/7.0/search-request-stored-fields.html

Hi David,

Thanks for your quick response, I'm checking this.

I've tried to remove the source and add the store fields, and of course, the query filtering by retailer.id improved a lot.
But when filtering by the store.area with a geo_shape query it takes 229 ms to process, with the 6200 nested objects.
This query takes long because I retrieve fields from the nested objects, if I delete the _source on the nested query it takes 30ms.
Is there any way to return the fields from an inner_hit that is not from _source (like stored_fields)?
I also checked that can't do the stored_field on inner hit result.

Great. So we prove that ready a BIG document from disk is costing more than reading a smaller one.

That's a different question. The previous question was related to "how much it costs to read a doc from disk to generate the response for the 10 hits that we need to retrieve".
The new question is more about "the response time when searching with a term query vs with a geo_shape query.

That's like comparing oranges and apples to me. There is a price to pay to make that kind of filtering, whatever the size of the fields you will need to read after having been able to select 10 of them.

Is there any way to return the fields from an inner_hit that is not from _source

I don't know. I don't think so. The only thing I can see but that will probably not answer to your needs, is to use a copy_to option in the nested object to copy values to the parent object in a stored field. But as a result you won't get only the internal match.

That could be better to provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

Thanks for your response.

Excuse me if I explained incorrectly, I didn't want to compare one query with the other.

At the end I solved it accessing to the nested fields through scriptfields and not using the source on root doc.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.