Multiple sorting fields performance issues

bogdanjsx · May 17, 2018, 2:10pm

Hello,

I'm trying to implement pagination for a service that pulls data out of Elasticsearch by using the search after functionality. Data needs to be sorted by only one field, but, as the docs mention, a field with unique values must be used as a tiebreaker.
I first tried using _uid, as suggested in the docs, but it's a pretty big performance drawback, as it does not have doc_values enabled on it.
I then switched to using another keyword-type field, which made it better, but it still takes a lot more time and memory than a single sorting field.

Does anyone have a suggestion on the second field I should use to minimize the second sort overhead? Or any other way that can make the search_after paging work.
Also, is there any way I can measure how much time the second sort takes? Profile API only shows query stats, but no sorting data, as far as I can see.

I'm using Elasticsearch version 5.5, on a cluster with 3 data nodes, 3 coordinating nodes and 3 master nodes. The data is split with an index per time frame approach (index configuration: 9 shards, 2 replicas), each one holding 2 weeks worth of data, with about 30 million docs each.

Thanks!

ddorian43 · May 19, 2018, 8:17pm

Can you upgrade to 6 and try index-sorting: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/index-modules-index-sorting.html ?

Can you try joining both values into 1 field (don't know if it will help to be honest)?

bogdanjsx · May 21, 2018, 7:49am

Hi and thanks for the answer.

Unfortunately, none of these solutions can be done, since the first sorting field is not always the same, but rather sent on the request. Because of that, any field can be a potential sorting field.

ddorian43 · May 21, 2018, 8:18am

How are you filtering the documents ? Maybe you can index-sort by filtered-fields so you don't query some segments at all ?

ddorian43 · May 21, 2018, 8:19am

How are sort fields distributed ? If 80% of sorts are for field a you can index-sort by that and have the other slow ones still being slow.

bogdanjsx · May 21, 2018, 9:57am

Sort field distribution is random, it's user chosen.
Also, an upgrade to 6 is not really possible at the moment.

system · June 18, 2018, 9:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiple `sort` using with only one `search_after` Elasticsearch	1	655	September 11, 2019
Search after - unique sort fields Elasticsearch	1	416	August 7, 2018
Index sorting with two order values in the same field Elasticsearch	1	230	April 6, 2023
Sorting on multiple fields issue Elasticsearch	2	825	July 6, 2017
Performance of sorting on nested field Elasticsearch	8	3801	July 5, 2017

Multiple sorting fields performance issues

Related topics