Elasticsearch question. I'm not sure what background data is most helpful so I'll start with basics...
ES version: 1.3.1
Lucene: 4.9
We're seeing a situation periodically where our ES latency spikes dramatically, and the most interesting attribute of this pathology is that every node in the cluster spikes in CPU and load average at the same time.
I'm wondering what might cause that behavior? Here are some observations we've made to rule things out:
- Memory and I/O look good
- No significant change/spike in requests (though we do seem to see this happen mid-day when our load is higher than nights/weekends)
- Segment Merge log doesn't appear to show an increase in merges, or merge latency
- I can't imagine garbage collection is to blame since we see simultaneous behavior across nodes
Anyone have suggestions as to what to look at next? Any ideas about what sorts of problems tend to manifest in this behavior?
Thanks in advance.