Unable to fully utilize resources in Elasticsearch cluster

Hi,

I'm running tests to size up a cluster and get the best tuning parameters for my requirements - Index documents at the rate of 10k per second (each document being ~2k in size) across 10 indices, each having 3 shards each with async replication of 1

There is a 4 node cluster of c3.xlarge instances ES 1.6.0 with data stored on 2 instance store SSD drives. I'm generating bulk indexing requests at from a storm cluster of 3 nodes using NodeClient. So far I have been able to reach upto 2.5k but I seem to have reached a point where I can't figure out the bottleneck.

At a rate of 2.5k per second the cpu utilization on the nodes is only around 25%. But any increase in document indexing rate results in EsRejectedExecutionException (TransportShardReplicationOperationAction$PrimaryPhase).

The logs seem to suggest a lot of GC activity, but I can't seem to get better of this cluster at this point of time. Any help will be appreciated.

Here are some more info to troubleshoot:
iostat of all nodes
hot threads - node 1
hot threads - node 2
hot threads - node 3
hot threads - node 4

Let me know if anything else would help in giving pointers.

Can you please share some additional information about your cluster configuration, e.g. heap size and any non-default settings? What bulk size are you using? What type oaf data are you indexing? In addition to the indexing, how much querying is going on at the same time?

Hi Christian,

Thanks for looking into this. I have uploaded the elasticsearch.yml.

Here are the JVM args used:
-server -Djava.net.preferIPv4Stack=true
-Xms4479m -Xmx4479m -Xss256k -XX:NewRatio=1
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError

There are three NodeClients using BulkRequestBuilder and configured to do bulk index for 10000 or in 10 seconds whichever is earlier. Data is being pushed in the form of JSON, (most of them have inner documents too).

The current tests are mostly checking the indexing rate hence queries are very minimal ...(<10s per minute). But we do plan to have around 10 queries (mostly aggregations over 1 hour or 1 day) per second later on.