Data node high CPU

Hi Guys (@Christian_Dahlqvist / @Igor_Motov),

Setup:
Elasticsearch 1.4.7 (legacy)
3 master nodes - 2GB RAM, 1vCPUs.
4 data nodes - 30GB RAM, 8 vCPUs, EBS 500GB SSD two different availability zones.
2 client nodes - 15GB RAM, 8vCPUs, two different availability zones.

We have ~750 indices. Each once contains between 10K-3M documents.
One shard per each index, replicated 3 times (so each data nodes hold one shard replica).
The cluster should handle ~500 requests per second.

We are having a CPU load issues on our data node machines.
Any idea why does it happen?
Any idea how to utilize this setup?

P.S - we found in elasticsearch.yml the legacy configuration:

indices.fielddata.cache.size: 10%
indices.memory.index_buffer_size: 15%
indices.cache.filter.size: 10% 

also, in all instances,ES_HEAP_SIZE is set to 8gb

Thanks,
Itay

You have a lot of shards per node considering the amount of heap you have allocated. Any indication through logs or monitoring that you are suffering from heap pressure and extensive GC? What does disk I/O and iowait look like? What does the hot threads API give?

Thanks,
I'll check and come back with more information.

Just to clarify, these machines are dedicated only for that propose.
What would you recommend for the ES_HEAP_SIZE? 31gb?
and what about the cache? I saw here 75%

Another question: This cluster is currently serving. Can I change the configuration while serving? stopping each data node at a time, increase it heap size and start it again?
or should I restart all the cluster at the same time?

::: [data-srv-rcom-es-node-29bbe67][PSFf4qIRSGq9nGvwb16qEQ][data-srv-rcom-es-node-29bbe67.use.dy.com][inet[/172.30.0.95:9300]]{availability_zone=us-east-1a, aws_availability_zone=us-east-1a, max_local_storage_nodes=1, zone=SERVER, master=false}

60.7% (303.7ms out of 500ms) cpu usage by thread 'elasticsearch[data-srv-rcom-es-node-29bbe67][search][T#7]'
10/10 snapshots sharing following 2 elements
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
java.lang.Thread.run(Thread.java:748)

31.0% (155.1ms out of 500ms) cpu usage by thread 'elasticsearch[data-srv-rcom-es-node-29bbe67][search][T#13]'
10/10 snapshots sharing following 10 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:735)
java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:644)
java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1137)
org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1075)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1137)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
java.lang.Thread.run(Thread.java:748)

7.0% (35.2ms out of 500ms) cpu usage by thread 'elasticsearch[data-srv-rcom-es-node-29bbe67][search][T#4]'
 10/10 snapshots sharing following 10 elements
   sun.misc.Unsafe.park(Native Method)
   java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:735)
   java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:644)
   java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1137)
   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
   java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1075)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1137)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
   java.lang.Thread.run(Thread.java:748)

::: [master-rcom-es-node-9ce2daf][Rvt86XPWTQazt4C0e6kOlg][master-rcom-es-node-9ce2daf.use.dy.com][inet[/172.30.1.123:9300]]{aws_availability_zone=us-east-1b, data=false, max_local_storage_nodes=1, master=true}

0.1% (296.1micros out of 500ms) cpu usage by thread 'elasticsearch[master-rcom-es-node-9ce2daf][[transport_server_worker.default]][T#2]{New I/O worker #5}'
 10/10 snapshots sharing following 15 elements
   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
   sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
   sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
   sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
   org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:415)
   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
   org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
   org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
   org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
   java.lang.Thread.run(Thread.java:748)

0.0% (109.8micros out of 500ms) cpu usage by thread 'elasticsearch[master-rcom-es-node-9ce2daf][transport_client_timer][T#1]{Hashed wheel timer #1}'
 10/10 snapshots sharing following 5 elements
   java.lang.Thread.sleep(Native Method)
   org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:445)
   org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:364)
   org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
   java.lang.Thread.run(Thread.java:748)

0.0% (89.7micros out of 500ms) cpu usage by thread 'elasticsearch[master-rcom-es-node-9ce2daf][[timer]]'
 10/10 snapshots sharing following 2 elements
   java.lang.Thread.sleep(Native Method)
   org.elasticsearch.threadpool.ThreadPool$EstimatedTimeThread.run(ThreadPool.java:568)

::: [data-wrk-rcom-es-node-36448bf][xyTt6RnTQwWXyxKZedZDmg][data-wrk-rcom-es-node-36448bf.use.dy.com][inet[/172.30.0.27:9300]]{availability_zone=us-east-1a, aws_availability_zone=us-east-1a, max_local_storage_nodes=1, zone=WORKER, master=false}

37.4% (186.8ms out of 500ms) cpu usage by thread 'elasticsearch[data-wrk-rcom-es-node-36448bf][bulk][T#2]'
10/10 snapshots sharing following 10 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:735)
java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:644)
java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1137)
org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1075)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1137)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
java.lang.Thread.run(Thread.java:748)

32.3% (161.4ms out of 500ms) cpu usage by thread 'elasticsearch[data-wrk-rcom-es-node-36448bf][bulk][T#3]'
10/10 snapshots sharing following 10 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:735)
java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:644)
java.util.concurrent.

adding screenshot from bigdesk:

It looks like it is primarily busy with search and indexing load. Has the CPU load changed lately or is this something that has grown over time? What type of queries are you running? How many shards do there queries typically address? Has the request and/or indexing volume increased over time? Has the total data volume in the cluster changed?

We had cluster of 6 nodes, with no role separation. Each one of them was client & data & master.
We separate to roles after reading some posts of "elasticsearch in production". We also separate our cluster to two zones (I described only the serving zone) so we have also working zone (we we bake our indices). Nothing was change from the serving perspective, same load, same queries, only nodes roles separation.
Another things that were changed: the shard number per index which was 4 and now it's 1, replication was 5 and now it 3.

There is indexing operation in serving zone. We reallocate index once it's done (from working zone to serve zone) and flip alias to the new index.

Did you perform indexing into the serving nodes before as well? If not, indexing can use a good amount of CPU, so it is possible that you may need to scale out to accommodate this extra load.

No.
We had two different clusters before and we used snapshots to move indices between clusters. We took your advice and start working with one cluster, separate into two zones: worker & server. Each zone has its own client nodes and it's own data nodes. Only master nodes are shared (between zones). Thus, I'm sure that only nodes of the worker zone are indexing.

Is there any difference in query and/or indexing load on the data nodes in the 'serving' zone compared to when there were separate clusters? Has the indexing or query volume increased? Has data volume increased?

Same queries.
Same data volume per data node, except the shard partition (currently 1 shard per index, 4 shards per index before).
In the new cluster indices arrive via reallocation api (from worker zone), in old cluster indices arrive via snapshot api (entire index each time).

Another change is that I'm using the optimize API for merging the index segments (to1 single segment) before moving it to serve zone. What are the disadvantages for this approach? do you think it might be related to the problem? we see improve of 20% in latency, not sure if it because of that, the node roles separation, the shards number (1 instead of 4) or something else.

@Christian_Dahlqvist by the way, we doesn't have aggregate queries at all.

How much has CPU usage increased?

It reached 100% usage

How high was it before the changes?

not sure, but it was stable

I unfortunately have limited knowledge about the exact state before and after the change, and it is hard for me to tell what could have changed and what potential impact it has had.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.