We have 3 ES servers in a cluster, each with 52GB RAM and 8 Cores.
Each server has 4 local SSD disks attached to it.
ES version is 1.5.2
We are using bulk inserts through logstash with flush_size of 2000.
The servers are constantly at 85%+ CPU, while disk usage is like 5%.
Is there something we can check to see why its taking so much CPU?
It's probably just the indexing process that happens, it's intentionally intensive as it makes the resultant search fast.
Is it causing issues?
I am afraid that its taking more CPU than it should. Is there any way i can see why its taking that much?
In my experience, indexing is often quite CPU intensive, and CPU generally gets saturated before disk I/O. Exactly how CPU intensive it is will depend on the indexing throughput as well as the structure and size of the documents as well as the mappings used. If you need to reduce CPU usage, e.g. in order to be able to serve queries within a certain latency, you may need to reduce the indexing throughput or scale out the cluster.
The more the CPU, the faster ES is. So using CPU cycles is perfect, and 85% is nothing! This means system load is less than 1.0. (Note that my bulk sessions can rise up to 8.0 -12.0 system load)
You can look at your client and decrease bulk request length and bulk request frequency, you can even implement pausing between bulk requests. This is very easy and will slow down indexing.
If you want advanced configuration, in 1.5.2 you can use store throttling to slow down ES nodes, even if clients do not want that.
The motivation was to solve issues when slow disks block CPU on ES servers. On modern hardware, this is no longer the case. Be aware store throttling is removed in ES 2.x, it was an advanced setting, and must be handled carefully. For example, all nodes should throttle at same rate.
It got to 100% now
Can i somehow scale just the CPUs to more machines? without having them save something on disk?