Multiple sorting fields performance issues


(Bogdan Stefan) #1

Hello,

I'm trying to implement pagination for a service that pulls data out of Elasticsearch by using the search after functionality. Data needs to be sorted by only one field, but, as the docs mention, a field with unique values must be used as a tiebreaker.
I first tried using _uid, as suggested in the docs, but it's a pretty big performance drawback, as it does not have doc_values enabled on it.
I then switched to using another keyword-type field, which made it better, but it still takes a lot more time and memory than a single sorting field.

Does anyone have a suggestion on the second field I should use to minimize the second sort overhead? Or any other way that can make the search_after paging work.
Also, is there any way I can measure how much time the second sort takes? Profile API only shows query stats, but no sorting data, as far as I can see.

I'm using Elasticsearch version 5.5, on a cluster with 3 data nodes, 3 coordinating nodes and 3 master nodes. The data is split with an index per time frame approach (index configuration: 9 shards, 2 replicas), each one holding 2 weeks worth of data, with about 30 million docs each.

Thanks!


(ddorian43) #2

Can you upgrade to 6 and try index-sorting: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/index-modules-index-sorting.html ?

Can you try joining both values into 1 field (don't know if it will help to be honest)?


(Bogdan Stefan) #3

Hi and thanks for the answer.

Unfortunately, none of these solutions can be done, since the first sorting field is not always the same, but rather sent on the request. Because of that, any field can be a potential sorting field.


(ddorian43) #4

How are you filtering the documents ? Maybe you can index-sort by filtered-fields so you don't query some segments at all ?


(ddorian43) #5

How are sort fields distributed ? If 80% of sorts are for field a you can index-sort by that and have the other slow ones still being slow.


(Bogdan Stefan) #6

Sort field distribution is random, it's user chosen.
Also, an upgrade to 6 is not really possible at the moment.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.