Can't create histogram for summed values

Can't create histogram for summed values

I'cant figure out how to create histogram for sumemd values. This query gives me sum of events time per user:
GET /events/_search

{
  "size": 0,
  "aggregations": {
    "by_user": {
      "terms": {
        "field": "user_id"
      },
      "aggregations": {
        "time_per_user": {
          "sum": {
            "field": "event_time"
          }
        }
      }
    }
  }
}

No i want to create a histogram / get range information like this:

  • 0 - xxx seconds: zzz users
  • xxx - yyy seconds: aaa users
    ...

Is there any way to achieve this in Elasticsearch?

  • simple histogram does not work for me, because i need to summarize time per user first.
  • I tried also to use range functionality ( i can predict time buckets easily), but none works for me.
  • I can't do the histogram, from data received by query above, because number of buckets is easilly into 100k
  • it would be similar to double grouping in mysql:
SELECT count(1),time_sum FROM (
	SELECT SUM(time) as time_sum, user_id FROM users GROUP BY user_id
)
GROUP BY time_sum

Any help would be greatly appreciated.

I believe that it is not possible. I understand that you want to firstly calculate the sums of times from all users and then you want to make a histogram of those sums.

This would be a type of pipeline histogram, and there is currently no pipeline aggregation that does histograms available. I also think there is an inherent difficulty because you say that users has a large cardinality, so the initial sum of times for all users would also need a very high "size" value in order to capture the sums of all the users in the data.

So the only way for you to do what you want is to calculate in two stages -

  • calculate sums for all users (using a composite aggregation which will let you page through all the users)
  • index these values into a "user_totals" index
  • run a regular histogram on the user_totals

Sorry i can't offer anything better, unless i missed something here.

1 Like

@Matt, thank you for your prompt response. It appears that achieving my goal isn't straightforward, but I've devised an effective solution tailored to my requirements – obtaining histogram data for values in buckets.

Here's the approach I took:

  1. Implemented the filter_path parameter to exclude buckets exceeding 100,000 from the response.
  2. Utilized the percentiles_bucket aggregation to retrieve values at specified percentile points.
  3. Opted for using only 20 percentile points, considering that this might suffice when bucket values are evenly distributed.
  4. In my case, where bucket values exhibited significant skewness beyond 90%, I extended the percentiles above 90 (e.g., 90.1, 90.2, etc.).
  5. Noted that Elastic's performance wasn't impacted even when passing an array of 1000 percentile points.
  6. Upon receiving the Elasticsearch response in my backend, extracted:
  • The value at each specified percentile point.
  • The total count of buckets (obtained from statistical aggregations).
  1. Utilized this data to:
  • Calculate the number of targets falling within each percentile point.
  • Normalize percentile points into 20 buckets, summarizing the target counts.

This process has provided me with a precise histogram data array, meeting my specific needs.

Query used:

{
  "size": 0,
  "aggregations": {
    "by_user": {
      "terms": {
        "field": "user_id",
        "size": 250000,
        "order": {
          "eng_per_target": "desc"
        }
      },
      "aggregations": {
        "time_per_user": {
          "sum": {
            "field": "event_time"
          }
        },
        "cleared": {
          "bucket_script": {
            "buckets_path": {
              "time_per_user": "time_per_user"
            },
            "script": "Math.min(params.time_per_user , 3600)"
          }
        }
      }
    },
    "percentage": {
      "percentiles_bucket": {
        "buckets_path": "by_user>cleared",
        "percents": [ 0, 2.5, 5, 7.5 ... ]
        ]
      }
    },
    "stats": {
      "stats_bucket": {
        "buckets_path": "by_user>cleared"
      }
    }
  }
}

Actual percents array:
"percents": [0, 2.5, 5, 7.5, 10, 12.5, 15, 17.5, 20, 22.5, 25, 27.5, 30, 32.5, 35, 37.5, 40, 42.5, 45, 47.5, 50, 52.5, 55, 57.5, 60, 62.5, 65, 67.5, 70, 72.5, 75, 77.5, 80, 82.5, 85, 87.5, 90, 90.1, 90.2, 90.3, 90.4, 90.5, 90.6, 90.7, 90.8, 90.9, 91, 91.1, 91.2, 91.3, 91.4, 91.5, 91.6, 91.7, 91.8, 91.9, 92, 92.1, 92.2, 92.3, 92.4, 92.5, 92.6, 92.7, 92.8, 92.9, 93, 93.1, 93.2, 93.3, 93.4, 93.5, 93.6, 93.7, 93.8, 93.9, 94, 94.1, 94.2, 94.3, 94.4, 94.5, 94.6, 94.7, 94.8, 94.9, 95, 95.02, 95.04, 95.06, 95.08, 95.1, 95.12, 95.14, 95.16, 95.18, 95.2, 95.22, 95.24, 95.26, 95.28, 95.3, 95.32, 95.34, 95.36, 95.38, 95.4, 95.42, 95.44, 95.46, 95.48, 95.5, 95.52, 95.54, 95.56, 95.58, 95.6, 95.62, 95.64, 95.66, 95.68, 95.7, 95.72, 95.74, 95.76, 95.78, 95.8, 95.82, 95.84, 95.86, 95.88, 95.9, 95.92, 95.94, 95.96, 95.98, 96, 96.02, 96.04, 96.06, 96.08, 96.1, 96.12, 96.14, 96.16, 96.18, 96.2, 96.22, 96.24, 96.26, 96.28, 96.3, 96.32, 96.34, 96.36, 96.38, 96.4, 96.42, 96.44, 96.46, 96.48, 96.5, 96.52, 96.54, 96.56, 96.58, 96.6, 96.62, 96.64, 96.66, 96.68, 96.7, 96.72, 96.74, 96.76, 96.78, 96.8, 96.82, 96.84, 96.86, 96.88, 96.9, 96.92, 96.94, 96.96, 96.98, 97, 97.02, 97.04, 97.06, 97.08, 97.1, 97.12, 97.14, 97.16, 97.18, 97.2, 97.22, 97.24, 97.26, 97.28, 97.3, 97.32, 97.34, 97.36, 97.38, 97.4, 97.42, 97.44, 97.46, 97.48, 97.5, 97.52, 97.54, 97.56, 97.58, 97.6, 97.62, 97.64, 97.66, 97.68, 97.7, 97.72, 97.74, 97.76, 97.78, 97.8, 97.82, 97.84, 97.86, 97.88, 97.9, 97.92, 97.94, 97.96, 97.98, 98, 98.02, 98.04, 98.06, 98.08, 98.1, 98.12, 98.14, 98.16, 98.18, 98.2, 98.22, 98.24, 98.26, 98.28, 98.3, 98.32, 98.34, 98.36, 98.38, 98.4, 98.42, 98.44, 98.46, 98.48, 98.5, 98.52, 98.54, 98.56, 98.58, 98.6, 98.62, 98.64, 98.66, 98.68, 98.7, 98.72, 98.74, 98.76, 98.78, 98.8, 98.82, 98.84, 98.86, 98.88, 98.9, 98.92, 98.94, 98.96, 98.98, 99, 99.02, 99.04, 99.06, 99.08, 99.1, 99.12, 99.14, 99.16, 99.18, 99.2, 99.22, 99.24, 99.26, 99.28, 99.3, 99.32, 99.34, 99.36, 99.38, 99.4, 99.42, 99.44, 99.46, 99.48, 99.5, 99.52, 99.54, 99.56, 99.58, 99.6, 99.62, 99.64, 99.66, 99.68, 99.7, 99.72, 99.74, 99.76, 99.78, 99.8, 99.82, 99.84, 99.86, 99.88, 99.9, 99.92, 99.94, 99.96, 99.98]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.