I have the following mapping
{
"yellows" : {
"aliases" : { },
"mappings" : {
"yellow" : {
"properties" : {
"ranges" : {
"type" : "nested",
"properties" : {
"geometry" : {
"type" : "geo_shape"
},
"id" : {
"type" : "long"
},
"other1" : {
"type" : "keyword"
},
"other2" : {
"type" : "long"
},
"other3" : {
"type" : "long"
}
}
}
...
}
}
}
}
}
queries gets slower and slower the bigger the size
. For example
curl https://path/to/elastic/yellows/_search?_source_exclude=ranges&from=0&size=50' --data-binary '{"query":{"bool":{"must":[],"filter":{"bool":{"filter":[{"terms":{"...":["1"]}},{"terms":{"...":["..."]}}],"should":[]}}}},"sort":[{"...":{"order":"asc"}}]}'
# size 50 -> "took":71
curl https://path/to/elastic/yellows/_search?_source_exclude=ranges&from=0&size=100' --data-binary '{"query":{"bool":{"must":[],"filter":{"bool":{"filter":[{"terms":{"...":["1"]}},{"terms":{"...":["..."]}}],"should":[]}}}},"sort":[{"...":{"order":"asc"}}]}'
# size 100 -> "took":1421
At the same time, queries of size=0
or with _source=false
are fast. For example
curl https://path/to/elastic/yellows/_search?_source_exclude=ranges&from=0&size=0' --data-binary '{"query":{"bool":{"must":[],"filter":{"bool":{"filter":[{"terms":{"...":["1"]}},{"terms":{"...":["..."]}}],"should":[]}}}},"sort":[{"...":{"order":"asc"}}]}'
# size 0 -> "took":32
curl https://path/to/elastic/yellows/_search?_source=false&from=0&size=100' --data-binary '{"query":{"bool":{"must":[],"filter":{"bool":{"filter":[{"terms":{"...":["1"]}},{"terms":{"...":["..."]}}],"should":[]}}}},"sort":[{"...":{"order":"asc"}}]}'
# _source=false -> "took":167
That means that queries retrieving the _source
s (ie without _souce=false
or size=0
) are slower. Also, it seems that the more ranges in the retrieved documents the slower is the response. I’m using wc -c
in the following as a proxy measure of how many ranges are in the retrieved documents. Not the best measure but should suffice
curl https://path/to/elastic/yellows/_search?from=0&size=50' --data-binary '{"query":{"bool":{"must":[],"filter":{"bool":{"filter":[{"terms":{"...":["1"]}},{"terms":{"...":["..."]}}],"should":[]}}}},"sort":[{"...":{"order":"asc"}}]}' | wc -c
# 2.332.822
curl https://path/to/elastic/yellows/_search?from=50&size=50' --data-binary '{"query":{"bool":{"must":[],"filter":{"bool":{"filter":[{"terms":{"...":["1"]}},{"terms":{"...":["..."]}}],"should":[]}}}},"sort":[{"...":{"order":"asc"}}]}' | wc -c
# 38.591.502
As you can see the first 50 have much less ranges than the second 50 in the first 100. Also, notice that in the first snippet, the query for the first 50 is much faster than the query for the second 50 even if it has _source_exclude=ranges
.
It seems to me that the query is not the bottleneck. In fact, with size=0
or with _source=false
the response time is small. So I suspect that it’s the fact that ranges are a nested field and Elastic takes them into consideration even if the request excludes them (ie _source_exclude=ranges
).
Is there any other way to make the queries faster without changing the mapping or should I change the mapping so that ranges are not nested?