Glad to hear someone else seeing same problems we are seeing. We're using slightly different HW with similar results.
See my old post here:
Once we got past the testing harness setup, we were able to reproduce the slow indexing performance internally.
Our HW is:
Virident PCIe SSD card (config for performance) 1.8TB
64GB RAM
2x12 core Xeon (HT on, or equiv of 48) (5 physical bare metal nodes x 2 sets for faster testing of various parameters combination)
ES v1.7.2
JDK 8u60 (also tested with JDK7u51, JDK8u40)
Tested various maxheap from 16G to 31G.
mlockall on
max fd is 64K
refresh interval is -1
index.store.throttle.type: none
index.store.throttle.max_bytes_per_sec: 700mb
index.translog.flush_threshold_size: 1gb
indices.memory.index_buffer_size: 512mb
5 shards so we get 1 per node
no replica
various doc size from 1k to 16K
same data set on a RAMdisk so we always read same data via logstash file input
tested with 1 LS instance, 5 instances, 20 instances, etc.
Various bulk indexing sizes (100, 500, 1000, 5000, 10000, etc.).
Our conclusion is that I/O, CPU and memory are not the problem. We always hit a limit in how fast ES can index.
How are you ingesting data? Logstash? or your own client doing bulk insert? You can try increasing the number of instances feeding ES. We notice a slight increase in indexing speed, but it falls off after 10 concurrent LS instances into the 5 ES nodes.