This might seem a bit complicated but stick with me please
At the moment we have an immense mapping (230,000+ lines) of which most is mapping used for sorting. We save a score in a field with a key like this:
{
"ranking": {
"foo": {
"bar": 42,
"bla": 16,
...
}
}
}
... and we sort on ranking.foo.bar
. And yes, yes, before you tell me I should do it like this...
{
"ranking": [
{
"name": "foo.bar",
"score": 42
},
{
"name": "foo.bla",
"score": 16
},
...
]
}
... sure, we realise that now so that's what we want to do We would then do sorting by ranking.score
, which would look something like this
{
"query": {
"match_all": {}
},
"sort": [
{
"rankings.score": {
"nested_filter": {
"term": {
"rankings.name": "foo.bar"
}
},
"nested_path": "rankings",
"order": "desc"
}
}
]
}
The thing is, this way our rankings
array will be quite big, there could be 100's or 1000's objects in there. We were thinking that we could split it up in "buckets/partitions" by hashing the key of the ranking. That way it would look like this
{
"ranking": {
"partition1": [
{
"name": "foo.bar",
"score": 42
},
...
],
"partition2": [
{
"name": "foo.bla",
"score": 16
},
...
]
}
}
with the sorting query looking like this
{
"query": {
"match_all": {}
},
"sort": [
{
"rankings.partition1.score": {
"nested_filter": {
"term": {
"rankings.partition1.name": "foo.bar"
}
},
"nested_path": "rankings.partition1",
"order": "desc"
}
}
]
}
... our thinking being that elasticsearch would then not have to loop through the whole array but just a specific partition. And that's really the question: would this partitioning save time on ES's end or is this already solved in a different manner (for instance doc values or field data)?