Partitioning a mapping for sort or not?

This might seem a bit complicated but stick with me please :slight_smile:

At the moment we have an immense mapping (230,000+ lines) of which most is mapping used for sorting. We save a score in a field with a key like this:

{
    "ranking": {
        "foo": {
            "bar": 42,
            "bla": 16,
            ...
        }
    }
}

... and we sort on ranking.foo.bar. And yes, yes, before you tell me I should do it like this...

{
    "ranking": [
        {
            "name": "foo.bar",
            "score": 42
        },
        {
            "name": "foo.bla",
            "score": 16
        },
        ...
    ]
}

... sure, we realise that now so that's what we want to do :slight_smile: We would then do sorting by ranking.score, which would look something like this

{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "rankings.score": {
                "nested_filter": {
                    "term": {
                        "rankings.name": "foo.bar"
                    }
                },
                "nested_path": "rankings",
                "order": "desc"
            }
        }
    ]
}

The thing is, this way our rankings array will be quite big, there could be 100's or 1000's objects in there. We were thinking that we could split it up in "buckets/partitions" by hashing the key of the ranking. That way it would look like this

{
    "ranking": {
        "partition1": [
            {
                "name": "foo.bar",
                "score": 42
            },
            ...
        ],
        "partition2": [
            {
                "name": "foo.bla",
                "score": 16
            },
            ...
        ]
    }
}

with the sorting query looking like this

{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "rankings.partition1.score": {
                "nested_filter": {
                    "term": {
                        "rankings.partition1.name": "foo.bar"
                    }
                },
                "nested_path": "rankings.partition1",
                "order": "desc"
            }
        }
    ]
}

... our thinking being that elasticsearch would then not have to loop through the whole array but just a specific partition. And that's really the question: would this partitioning save time on ES's end or is this already solved in a different manner (for instance doc values or field data)?

If JSON is key: value pairs then yes, shifting your variable data over to the right hand side of that : makes sense to avoid inflating your mappings.
The approach you outline of using nested docs helps with that problem (or looks like it might) but comes at a cost in the overhead of creating nested documents on disk.
The other "right-hand-side" approach is to concatenate names and scores into single tokens e.g. rank_score: foo.bar.0026 and sort on these tokens. It keeps the volume of Lucene docs low and mappings are stable but your tokens may need zero padding to be sortable and cleansed before display.

I think that wouldn't work with multiple names because the name of the ranking determines which record comes first then (if I understand you correctly). We would be working with a lot of names.

Could you tell me a bit more about the overhead of nested docs on disc specifically for sorting? Also, would our "partition/bucket" approach work to alleviate some of that pain?

My mistake. I saw your nested_filter clause (which was news to me!) and assumed there might be an equivalent filter clause when the multiple values you want you to filter are in a top-level array rather than held in nested docs. I was wrong.

Ordinarily a single elasticsearch JSON doc is mapped to one physical Lucene document on disk.
If you use nested docs a single elasticsearch JSON doc is mapped to multiple Lucene documents - one for the root doc and one for every nested array element. This avoids the "cross-matching" search problem you have if you index only a single Lucene doc but comes with a cost. There are some data structures that are a direct multiple of how many Lucene documents you have e.g. filters which have a bit per Lucene doc.
I say "direct multiple" because we used to use structures like BitSets where this is certainly true but there are some optimisations that target the issues of sparse data structures. However, I wouldn't rely on these optimisations to solve extreme cases of inefficient storage design.

Thanks for that Mark!