Can you run a bucket selector aggregation using the output of a stats bucket agg?


The basic overview of the user events query below: We bucket by week, then we bucket by user, then for each user we bucket by day, to get events per day per user per week. The goal is to return a count of users who have events on multiple days within the given interval.

I use a stats_bucket pipeline aggregation in the middle layer to get a count of the number of day buckets per user (I'm using 2.3 so I don't have access to bucket_count). I want to use a bucket_selector agg to then limit the results of the user bucketing to only users that have a stats_bucket count greater than 1.

Basically, the stats_bucket outputs a count of the number buckets in the "events_per_day" date histogram. I only want to include users that have >1 bucket, so I want to run a bucket_selector on the "count" from the stats bucket"

But something's up with my query (which I'm running in python using elasticsearch-dsl). Everything works fine if I take out the bucket_selector, so there must be something wrong with my script or buckets path which is raising the exception TransportError(503, u'reduce_search_phase_exception').

     {'histogram': {
         'aggs': {
             'users': {
                 'aggs': {
                     'bucket_filter': {
                         'bucket_selector': {
                             'buckets_path': {
                                 'test': 'total_days.count'},
                             'script': "params.test > 1"}},
                     'events_per_day': {
                         'date_histogram': {
                             'field': 'timestamp',
                             'interval': 'day',
                             'min_doc_count': 1}},
                     'total_days': {
                         'stats_bucket': {
                             'buckets_path': 'events_per_day._count'}}},
                 'terms': {
                     'field': 'user_id',
                     'size': 0}}},
         'date_histogram': {
             'field': 'timestamp',
             'interval': '1w',
             'min_doc_count': 0,
             'time_zone': 'UTC'}}},
 'query': {
     'bool': {
         'filter': [{
             'exists': {
                 'field': 'event_type'}}, {
             'range': {
                 'timestamp': {
                     'lte': datetime.datetime(2017, 9, 30, 0, 0),
                     'time_zone': 'UTC'}}}, {
             'range': {
                 'timestamp': {
                     'gte': datetime.datetime(2017, 9, 1, 0, 0),
                     'time_zone': 'UTC'}}}]}},
    'size': 0}

Do you have a stack trace either in the response or in the Elasticsearch logs?

Yes, here's the full error:

 TransportError                            Traceback (most recent call last)
<ipython-input-387-e2858b7217af> in <module>()
----> 1 result = search.execute().aggregations.histogram.buckets

/data/virtualenv/79626b6a4c46d478421d1fba4153c0da/lib/python2.7/site-packages/elasticsearch_dsl/search.pyc in execute(self, ignore_cache)
    577                     doc_type=self._doc_type,
    578                     body=self.to_dict(),
--> 579                     **self._params
    580                 ),
    581                 callbacks=self._doc_type_map

/data/virtualenv/79626b6a4c46d478421d1fba4153c0da/lib/python2.7/site-packages/elasticsearch/client/utils.pyc in _wrapped(*args, **kwargs)
     67                 if p in kwargs:
     68                     params[p] = kwargs.pop(p)
---> 69             return func(*args, params=params, **kwargs)
     70         return _wrapped
     71     return _wrapper

/data/virtualenv/79626b6a4c46d478421d1fba4153c0da/lib/python2.7/site-packages/elasticsearch/client/__init__.pyc in search(self, index, doc_type, body, params)
    546             index = '_all'
    547         _, data = self.transport.perform_request('GET', _make_path(index,
--> 548             doc_type, '_search'), params=params, body=body)
    549         return data

/data/virtualenv/79626b6a4c46d478421d1fba4153c0da/lib/python2.7/site-packages/elasticsearch/transport.pyc in perform_request(self, method, url, params, body)
    328             try:
--> 329                 status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
    331             except TransportError as e:

/data/virtualenv/79626b6a4c46d478421d1fba4153c0da/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.pyc in perform_request(self, method, url, params, body, timeout, ignore)
    107         if not (200 <= response.status < 300) and response.status not in ignore:
    108             self.log_request_fail(method, url, body, duration, response.status, raw_data)
--> 109             self._raise_error(response.status, raw_data)
    111         self.log_request_success(method, full_url, url, body, response.status,

/data/virtualenv/79626b6a4c46d478421d1fba4153c0da/lib/python2.7/site-packages/elasticsearch/connection/base.pyc in _raise_error(self, status_code, raw_data)
    106             pass
--> 108         raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)

TransportError: TransportError(503, u'reduce_search_phase_exception')

That's the python stack trace, I can dig into the logs instead if you prefer.

Any thoughts?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.