Hi All,
We are currently attempting to optimize our configuration for a static
index of roughly 120 million records. In time, this index will probably be
much larger, but for now this is the working set. We've been playing
around with Elasticsearch for several months now, and have made great
progress with performance tuning. However, we still run into issues which
leave us scratching our heads. One such issue is an unexpected indexing
speed drop as the index grows.
We are working on an 11 node cluster. Each node has 8 CPUs and 16G of
memory. Heap size of each JVM is set to min/max of 8G. Vm.swappiness has
been set to 0 on all of the systems, as they are being used solely for
Elasticsearch. The Elasticsearch version is 0.90.7. We are focusing on
loading a single index, and it has been initialized with 48 shards, with a
refresh interval of 120 seconds. We're currently using Elasticsearch HQ
for real time monitoring of the system state, along with linux utils like
top, iotop and iftop. Everything appears to be in order.
Frequently we have to reindex the entire dataset as we are working in a
development environment and are still determining how best to structure the
dataset. We are indexing via a batch load script that fires off 10,000
record curl requests to the _bulk endpoint. We partition the entire
dataset between three servers and run the batch load script simultaneously
on each one.
At first, this appears to work great. Initial indexing speeds are roughly
50 million/hour, which would load the entire dataset in a little over 2
hours. However, once the index approaches 20 million records, indexing
performance drops significantly (down to roughly 10 million/hour). As the
index continues to grow, performance continues to degrade, and I have seen
it drop as low as less than 1 million records per hour. All in all, it
takes nearly a day to index the entire dataset of 120 million records.
I was hoping that the community might be able to offer some advice as to
what we might be doing wrong, or suggest other diagnostic approaches.
We're really trying to ratchet this system up to prepare it for production
mode, and are currently left scratching our heads. Any thoughts, opinions,
or tips would be greatly appreciated.
Thanks!
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/98958587-eaf9-4451-84ee-78c38e7eab42%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.