I've been on/off dealing with some ES issues, and I think I've
resolved a LOT of them. but there's still one thing that bugs me.
two nodes, ten shards, one replica.
each node is a quad core 32gb of ram machine. min=max=16gb
our data is extremely uniform, and extremely small. but comes in at a
high rate of around 100 index actions per second.
ES is at 18.6 (I know, we'll be updating this week)
currently, all of the shards are marked as primary on node one.
uptime is 9 hours.
big desk shows 11399 garbage collection actions, and 2000 merges.
cpu has been uniform at under 5% for pretty much ever.
memory is at used=30gb, actual=14gb
uptime is 3 hours
bigdesk shows 104000 gc actions, and 800 merges.
cpu has been uniform at 80-90% for pretty much forever.
memory is at used=21gb, actual=13gb
node 1 is also running a few python scripts (server density, mongo
sync tool, geckoboard agent)
- why is cpu so high on node 2?
- I was thinking that the 'primary' shards are the ones that do the
merging, and the 'slaves' do the receives of the complete index, but
in both cases lucene indices have to do a merge. not so much a
question as making sure I'm understanding everything.
- my data is routed, which in theory should mean that some shards
would be doing a lot more work than others. does ES account for 'work'
and move shards from machine to machine? or does it just account for