Hi folks,
I'm running into a lot of heap circuitbreakers based on fielddata size. I've been reading through advice on how to improve performance, but I still don't have a good mental model to predict when I'm going to have trouble.
Some numbers: I've got ~220M docs in elasticsearch right now. I'm adding about 14M a day. They're primarily apache access logs. The node servers have a limit of ~10GB on the heap for the fielddata.
I add a field called 'site' when I create the doc; it's hard-coded to the site that created the log entry. There are a very limited number of unique values (<50, let's say).
Here's what the _mapping looks like:
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
}
So here's the thing: If I use Kibana 4 to visualize the top 5 sites using "terms" for site.raw in the last 4 hours, I get a circuitbreaker on site.raw field data. (I get one warning per shard for the more recent shards)
If I set the time window to a four hour period ten days ago, I get the warning that the more recent shards have failed, but the index handling this old data works fine and I get sensible results.
The only change I've made to the indices is that I went down to 1-hour indices a few days ago to see if that helped, and very recently I made @timestamp have doc_values:true. All the 1-hour index shards are failing, despite having 1/24th the documents as the big daily indices from earlier.
(As an aside, the doc_values on @timestamp has permitted me to do basic time sorting again, but should I really be using doc_values on all my non-analyzed fields?)
So what determines when the circuitbreaker is going to fire? It's not docs in the index. It's not number of distinct values. It's not total docs in the system. So... race condition? misleading error message? I'm sort of stumped.
Any insight appreciated! I'd love to be able to put 1B docs in here, but performance has been steadily degrading since I hit 100M or so.
Jeff