Hi,
The basic overview of the user events query below: We bucket by week, then we bucket by user, then for each user we bucket by day, to get events per day per user per week. The goal is to return a count of users who have events on multiple days within the given interval.
I use a stats_bucket
pipeline aggregation in the middle layer to get a count of the number of day buckets per user (I'm using 2.3 so I don't have access to bucket_count). I want to use a bucket_selector agg to then limit the results of the user bucketing to only users that have a stats_bucket count greater than 1.
Basically, the stats_bucket outputs a count of the number buckets in the "events_per_day" date histogram. I only want to include users that have >1 bucket, so I want to run a bucket_selector on the "count" from the stats bucket"
But something's up with my query (which I'm running in python using elasticsearch-dsl). Everything works fine if I take out the bucket_selector, so there must be something wrong with my script or buckets path which is raising the exception TransportError(503, u'reduce_search_phase_exception').
{'aggs':
{'histogram': {
'aggs': {
'users': {
'aggs': {
'bucket_filter': {
'bucket_selector': {
'buckets_path': {
'test': 'total_days.count'},
'script': "params.test > 1"}},
'events_per_day': {
'date_histogram': {
'field': 'timestamp',
'interval': 'day',
'min_doc_count': 1}},
'total_days': {
'stats_bucket': {
'buckets_path': 'events_per_day._count'}}},
'terms': {
'field': 'user_id',
'size': 0}}},
'date_histogram': {
'field': 'timestamp',
'interval': '1w',
'min_doc_count': 0,
'time_zone': 'UTC'}}},
'query': {
'bool': {
'filter': [{
'exists': {
'field': 'event_type'}}, {
'range': {
'timestamp': {
'lte': datetime.datetime(2017, 9, 30, 0, 0),
'time_zone': 'UTC'}}}, {
'range': {
'timestamp': {
'gte': datetime.datetime(2017, 9, 1, 0, 0),
'time_zone': 'UTC'}}}]}},
'size': 0}