Terms aggregation bucket Sorting is not working when using with partition

truptir · March 1, 2022, 12:08pm

I am trying to do TermAggregation + Sort order(by _count) + pagination(using partition). But resulting bucket partitions are not sorted properly.

Without using partitions, sorting is working properly.

Here is my elastic Query,

GET inventory/_search?filter_path=aggregations
{
 "aggs":  {
  "totalPacks": {
    "terms": {
      "field": "sellerId.keyword",
      "min_doc_count": 1,
      "shard_min_doc_count": 0,
      "show_term_doc_count_error": false,
      "include": {
         "partition": 0,
         "num_partitions": 10
      },
      "size": 10,
      "order": {
         "_count": "asc"
      }
    }
  }
}
}

this gives output ,
partition: 0

"buckets" : [
        {
          "key" : "305860",
          "doc_count" : 3
        },
        {
          "key" : "13101",
          "doc_count" : 10
        }

partition: 2

"buckets" : [
        {
          "key" : "05702",
          "doc_count" : 4
        },
        {
          "key" : "17188",
          "doc_count" : 42
        }
      ]

doc_count: 4 is coming in 1st but should come in 0th partition
doc_count: 10 is coming on 0th but should come in 1ST partition.

Mark_Harwood · March 1, 2022, 12:59pm

This is working as designed.
Sort order is valid within partitions, not across them.
It’s a coping strategy for when your data is distributed across lots of machines which makes it difficult to perform certain operations in a single search. It relies on your client application stitching together results from multiple partitions where appropriate.

truptir · March 1, 2022, 1:07pm

it is partitioning in which order? And if we need to apply sort first and then partition the data. Is there any approach?

Mark_Harwood · March 1, 2022, 2:36pm

This comment with links may be of use

truptir · March 2, 2022, 1:07pm

Hi @Mark_Harwood, I understood the issue. But Is there a way to sort data across all partitions?

Or can u suggest any other way using that we can achieve Aggregation + sorting across all data+ pagination?

We are wondering how other big apps that use ES, manage the sort order with pagination.

Mark_Harwood · March 2, 2022, 1:59pm

The challenge is data locality.
Unless you have all data related to a seller kept on the same shard it’s harder to reason about each seller. Time-based indices make this harder. The fact you’re sorting by lowest doc count makes this even harder.
If you can put all docs related to the same seller in one shard there’s hope. “Custom routing” or the “transform” api are 2 ways to ensure information related to each seller is available in only one shard.

Tomo_M · March 3, 2022, 12:56pm

If you need "_count": "asc", what will happen if using rare terms aggregation?
Do you still need pagination even with rare terms aggregation??

truptir · March 22, 2022, 12:36pm

Hi @Mark_Harwood, I have 1 shard and 1 replica for my index then it should work right? Descending order on count is also not working even I have only one shard for my index.

Index Setting:

"number_of_shards": "1",
"number_of_replicas": "1"

Mark_Harwood · March 22, 2022, 5:49pm

In what way is it “not working”?
Can you share some outputs

truptir · March 23, 2022, 5:31am

I meant Descending sorting on count is not working across all data. I m looking for a way that returns descending sorted data on _count across all partitions.

As per your comment, I understood Sorting may not work across all data if data resides in different shards/machines. But here my index configuration is 1 shard. please suggest if this is possible by changing any configuration.

Output: For DESC order on _count
Partition: 0

 "buckets" : [
        {
          "key" : "34950",
          "doc_count" : 11
        },
        {
          "key" : "28747",
          "doc_count" : 10
        }
     ]

Partition:1

"buckets" : [
        {
          "key" : "29194",
          "doc_count" : 65
        },
        {
          "key" : "16518",
          "doc_count" : 36
        }
      ]

Mark_Harwood · March 23, 2022, 5:00pm

That could be done by just not using partitions, no?

truptir · March 25, 2022, 6:35am

But I want a Paginated bucket. result query may return more than 1 lakhs bucket but at a time it should return 15 buckets this is the reason I am using partitions.

Mark_Harwood · March 25, 2022, 7:47am

Deep pagination on a derived value is tough because the deeper you go into the smaller values you have to hold all the other keys and derived values in memory just to figure out where you are in the global order.
Maybe the solution is turn the derived value into a concrete one using the “transform” api to create a derived index holding the totals for each key. This can be queried and sorted on the total field using search_after param. This can be implemented efficiently because it requires less memory but does not support direct indexing into the sort position (eg randomly picking a page number from the results )

system · April 22, 2022, 7:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
During terms aggregation with partition we get sum_other_doc_count > 0 in between partitions Elasticsearch	4	330	January 10, 2023
Bucket sort aggregation with actual documents Elasticsearch	3	581	October 27, 2020
The problem about ordering in terms Elasticsearch	1	360	April 4, 2018
Disable order over Terms Aggregation Elasticsearch	3	1979	June 7, 2018
Result Sets of Aggregation Partitioning Elasticsearch	5	2446	June 23, 2017

Terms aggregation bucket Sorting is not working when using with partition

Related topics