How to sort aggregated based on sub aggregation's doc_count?

lessless · November 12, 2020, 10:25am

Hi,

I'm completely stuck trying to figure out how to sort "users" based on "status_codes" in the following query:

{
  
  "size": 0,
  "aggs": {
    "users": {
      "terms": {
        "field": "user_id"
      },
      "aggs": {
        "status_codes": {
          "filters": {
            "filters": {
              "200s": {
                "range": {
                  "status_code": {
                    "gte": 200,
                    "lte": 299
                  }
                }
              },
              "400s": {
                "range": {
                  "status_code": {
                    "gte": 400,
                    "lte": 499
                  }
                }
              },
              "500s": {
                "range": {
                  "status_code": {
                    "gte": 500,
                    "lte": 599
                  }
                }
              }
            }
          }
        },
        "bucket_truncate": {
          "bucket_sort": {
            "sort": [
           
                ],
           
            "from": 0,
            "size": 10
          }
        }
      }
    }
  }
}

I want "users", that have the most "400s", for example, to come first.

Mark_Harwood · November 12, 2020, 11:41am

It's probably easier to flip things the other way - ie status codes at the top level aggregation and users as the aggregation underneath.

lessless · November 12, 2020, 12:49pm

Will results answer the same question: give me the next 100 users starting from 400th user with their status_codes ordered by 200s/400s/500s ?

Mark_Harwood · November 12, 2020, 1:07pm

When you add deep pagination as a requirement and you’re talking about a distributed system that becomes a problem.

You may me better suited creating an entity-centric index from the log data to do user behaviour analysis. This can be done using transforms.

lessless · November 12, 2020, 7:56pm

Understood! Can you please give an example of the query if we throw pagination out of the equation?

This is a non-trivial thing and I'm willing to learn how it's done. Maybe it will help me understand Elastic better.

lessless · November 13, 2020, 9:13am

what puzzles me out is the idea of flipping an aggregation. it seems counter-intuitive to me and I would love to understand why it's the advisable way to go.

Mark_Harwood · November 13, 2020, 9:41am

It’s about working more easily with the natural sort order of ‘ terms’ aggs. They pick the most popular. With users at the top level you’ll find the most prolific user first and for each the most common status code for that user. With the status codes up top up you’ll get the most common code first (likely 200) and for each status code the most common user. So for 404s you’ll naturally get the users with the most 404s
This might highlight a user who has a lot of 404s but they also have many more 200s to make up for it (ie 404s are only 1% of their total traffic). To find users who have an unusually high mix of a status code (eg 90% of their traffic is 404s) simply use a ‘significant_terms’ agg instead of a ‘terms’ agg for the users.

system · December 11, 2020, 9:41am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How do I sort buckets by Term Aggregation's doc_count? Elasticsearch	2	13981	September 18, 2018
Sorting search results into buckets, and then sorting again Elasticsearch	2	519	February 22, 2018
Sorting on the sub-aggregation fields and applying Pagination on the same Elasticsearch	1	153	April 25, 2023
Sort buckets by the number of sub-buckets? Elasticsearch	2	907	July 5, 2017
Pagination + Sorted Aggregations: Efficiently Retrieve Sorted List of Values? Elasticsearch	2	185	August 23, 2023

How to sort aggregated based on sub aggregation's doc_count?

Related topics