Aggregations stop working silently on elasticsearch 1.4.4


(Anton Bogdanovich) #1

The following aggregation starts returning empty buckets while it returns the actual rows.
{ "aggs": { "contact_ids": { "terms": { "field": "contact_id" } }}}

It starts returning the buckets properly after clearing the cache this way.
curl -XPOST 'http://cluster.host:9200/_cache/clear' -d '{ "fielddata": "true" }'

But all this happens again after a while. Why does it fail silently? How to fix that?
We have 6 nodes with 32Gb of memory each (16Gb for ES), 6 indexes with 5 shards each
and around 200 million documents.


Fielddata and different types with the same field name in index
(Mark Walkom) #2

Check your logs, you may see something about circuit breaker there.


(Anton Bogdanovich) #3

We don't have any exceptions in our logs.

Could this mapping be an issue?
One contact_id has doc_type: true, the other one does not.
if contact_id does have one fielddata fields for both types
how then elasticsearch will decide if it need to use doc_values?

           contact: {
              properties: {
                contact_id: { type: 'long', index: 'not_analyzed'}
              }
            },
            event: {
              _parent: { type: :contact },
              properties: {
                user_id: { type: 'long', index: 'not_analyzed' },
                contact_id: { type: 'long', index: 'not_analyzed', doc_values: true },
                data: { type: 'object', index: 'not_analyzed' },
              }
            }

(Colin Goodheart-Smithe) #4

Could you show us what the response you get looks like?


(Anton Bogdanovich) #5

Query:

POST events_production_2015_04/contact/_search
{
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": []
                }
            }
        }
    },
    "aggs": {
        "contact_ids": {
            "terms": { "field": "contact_id" }
        }
    }
}

Response:

{
   "took": 240,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3614637,
      "max_score": 1,
      "hits": [
         {
            "_index": "events_production_2015_04",
            "_type": "contact",
            "_id": "12094631113617",
            "_score": 1,
            "_source": {
               "contact_id": 12094631113617
            }
         },
         ...
         {
            "_index": "events_production_2015_04",
            "_type": "contact",
            "_id": "12094629244489",
            "_score": 1,
            "_source": {
               "contact_id": 12094629244489
            }
         }
      ]
   },
   "aggregations": {
      "contact_ids": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": []
      }
   }
}

Then I do:

curl -XPOST 'http://cluster.host:9200/_cache/clear' -d '{ "fielddata": "true" }'

And just after that I do the same query and get the following result:

{
   "took": 2776,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3614639,
      "max_score": 1,
      "hits": [
         {
            "_index": "events_production_2015_04",
            "_type": "contact",
            "_id": "12094631113617",
            "_score": 1,
            "_source": {
               "contact_id": 12094631113617
            }
         },
         ...
         {
            "_index": "events_production_2015_04",
            "_type": "contact",
            "_id": "12094629244489",
            "_score": 1,
            "_source": {
               "contact_id": 12094629244489
            }
         }
      ]
   },
   "aggregations": {
      "contact_ids": {
         "doc_count_error_upper_bound": 5,
         "sum_other_doc_count": 3614629,
         "buckets": [
            {
               "key": 5497558141098,
               "doc_count": 1
            },
            ...
            {
               "key": 5497558191959,
               "doc_count": 1
            }
         ]
      }
   }
}

I don't get any CircuitBreakingException exceptions at all. I checked all the nodes log files, and query response does not return any exceptions as well.


(system) #6