Indexing speed: Config changes don't seem to have any effect


We have an elasticsearch cluster deployed on AWS. We are using 1.7.5. The cluster contains 6 data nodes and 3 master nodes.
The 6 data nodes are running on r3.xlarge instances (4 vCPU, 30.5G memory, 100G gp2 SSD).
The master nodes are running on t2.medium instances (2 vCPU, 4G memory)

The elasticseach.yml file for the data nodes looks like this:

bootstrap.mlockall: true dev1
cluster.routing.allocation.node_concurrent_recoveries: 4
cluster.routing.allocation.awareness.attributes: aws_availability_zone
node.master: false true
discovery.type: ec2
discovery.ec2.minimum_master_nodes: 2
discovery.ec2.tag.elasticsearch_cluster: dev1 us-west-2
cloud.node.auto_attributes: true /data
action.write_consistency: one 10s 5s 2s 500ms 1s 800ms 500ms 200ms

http.cors.enabled: true
http.cors.allow-origin: /https?://localhost(:[0-9]+)?/ 90%
indices.breaker.fielddata.limit: 80%
indices.breaker.request.limit: 20%

indices.fielddata.cache.size: 70%

script.groovy.sandbox.enabled: true

There are multiple indexes to which the data gets indexed to. All indexes have the same mappings. We use 3 shards and 2 replicas per index.
The document type that gets indexed (elements) has nested documents (properties). The number of properties per element can vary from tens to few hundreds. Each property is less than 1 KB. The element minus the properties can be few KBs to tens of KBs. We use bulk requests to index the data. The bulk indexing batch size is 10MB which on average contains 500 elements and 40k properties. These batches are indexed concurrently with 20 threads on different machines indexing 100 batches each.. The documents get indexed to different indexes. The bulk indexing requests are sent to an elastic load balancer which forwards the requests to the data nodes. The rate at which the data gets indexed is about 49K individual documents (elements + properties) per second. Given the resources available to the data nodes and the fact that the document sizes are small, I feel that this rate is slow. The refresh interval is set to the default of 1s.

I have tried a number of configuration changes that are recommended like changing the bulk indexing buffer size, testing with different indexing batch sizes, using client nodes etc but we seem to get the same indexing throughput of 49k docs per second. I haven't been able to see any effect of configuration changes on the throughput. The CPU usage is quite high on the data nodes during indexing (almost 99-100%). The indices store throttle limit is set to the default 20MB/sec. The recommended limit for SSD is 100MB/sec but we don't seem to be hitting that yet.

Any pointers on how to improve the indexing throughput would be greatly appreciated.

Based on your description, it sounds like performance is limited by the amount of available CPU rather than disk I/O. If this is the case, you should be able to increase the throughput by reducing the number of replicas. It may also be worthwhile experiment with an increased refresh interval.