Index sorting on low cardinality fields

spinscale · September 18, 2025, 8:14am

Hey,

I have been playing around with index sorting and cannot make sense of my results. Basically I have a keyword field named is_existing that is either true or false as possible values, or may not be set at all. Basically every search has a is_existing: true filter.

My assumption here was that by using index sorting on this field, I would be able to speed up all those searches, because they are touching less than 40% of the data, meaning we could basically skip 60% of the data with every search.

What I am seeing however after reindexing the data with index sorting enabled is, that searches are actually slower than without index sorting (i.e. a 70ms search now takes 85ms).

I have already played around with the sort order and putting missing fields first/last and there does not seem to be a difference. This is on Elasticsearch 8.18.

What am I missing here that could cause this?

Thanks for any pointers.

–Alex

john-wagster · September 18, 2025, 1:30pm

@spinscale that’s interesting. Can you share an example of the mapping and query that you are seeing this behavior on. I can probably take a look at it and give you some suggestions or dig into the code a bit at the very least to better explain what’s going on.

spinscale · September 19, 2025, 9:18am

Hey John,

thanks for replying!

Unfortunately I cannot share a minimal example, because this runs against a big live production dataset and the query itself is pretty complex.

Fast variant:

No index sorting. is_existing is of type keyword and contains only true or false or nothing as possible values.

Slower variant: Same mapping as above, but index sorting is configured like this:

index": {
      "sort.field": "is_existing",
      "sort.order": "desc",
      "sort.missing" : "_last"
}

I also tested when changing the sort.order to asc with no difference. My base assumption here was that true should be stored before false and before missing, so that Lucene only has to scan the first part of a segment.

Hope that helps!

–Alex

RainTown · September 19, 2025, 1:42pm

In such a scenario I might consider putting the data into different indices depending on value of that field. If, as you state, the queries are almost always for a specific value.

john-wagster · September 19, 2025, 2:46pm

I think your assumptions are good about how this would work.

Docs would agree as well: Index sorting settings | Reference

What I was hoping to get out of an example was to validate a couple of things. Sorting will only show a performance improvement when both the index and the query are sorted. It’s not a general purpose filtering mechanism.

So a modified example from the docs is something like this:

PUT events
{
  "settings": {
    "index": {
      "sort.field": "is_existing",
      "sort.order": "desc"
    }
  },
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "keyword"
      }
    }
  }
}

AND

GET /events/_search
{
  "size": 10,
  "sort": [
    { "is_existing": "desc" }
  ],
  "track_total_hits": false
}

Are you setting up the search request appropriately? Note that track_total_hits set to false is necessary so we aren’t looking to count everything that matches outside of the first 10 in this example.

Also just because it pings in my head. Have you considered using a boolean field there instead of a keyword it wouldn’t surprise me if that’s much more efficient (but to be fair that’s a completely separate optimization).

spinscale · September 23, 2025, 10:02am

Hey John,

you figured out where my assumptions are broken, thanks so much!

My assumption was, that with index sorting and the ability to look this mapping configuration up at query time, index sorting was automatically enabled, regardless of sorting - as I thought this would do always result an early termination and thus is always enabled.

I’ll try to add that sorting before the score and see what happens, might take a couple of days until I get to that, but will report back!

Thanks for all the time invested and explanations, much appreciated!

–Alex

Victor-Lee · September 23, 2025, 1:34pm

Index sorting won’t always speed things up for a true/false field. Since it’s low variety, the extra sorting work can actually slow searches a bit. You might get better results by relying on the filter cache or checking your mappings instead.

john-wagster · September 23, 2025, 1:46pm

I’ll try to add that sorting before the score and see what happens, might take a couple of days until I get to that, but will report back!

Awesome! I’ll keep an eye out and looking forward to seeing how it goes.

Topic		Replies	Views
Will index sorting improve performance on fuzzy queries? Elasticsearch	1	371	June 7, 2018
Difference between sorting in index level and inside query Elasticsearch	5	791	December 2, 2019
Elasticsearch Sorting before applying filters? Elasticsearch	8	492	July 6, 2017
Index sorting on boolean fields Elasticsearch	1	55	July 16, 2025
Unbelievable performance improvement with new Index Sorting feature Elasticsearch	6	5794	April 24, 2018

Index sorting on low cardinality fields

Related topics