Tuning indices.memory.max_index_buffer_size for indexing throughput


(Luke Nezda) #1

Hello All,

I am working on improving our indexing throughput. We create a fresh index on the same cluster as our production index every 6 hours so we want to minimize harm to search performance on that index while we create the next iteration. We have a 7 node es cluster (1.7.5, Oracle Java 1.8.0_72) of dedicated 4-core machines with 26GB RAM (Xmx13G for elastic) and a 375G attached NVME SSD; we use 12 worker processes to issue the indexing commands with pyelasticsearch (1.4) + elasticsearch-py (1.5). We're creating a fairly small index (20M docs, 125GB, 15 shards) with 1 replica. Here are some things we've done to try to optimize for our needs:

  1. We've experimented some with adding the replica at the end of creation of the next index but this seemed to adversely affect search performance so we've been writing the new index with a replica at the same time, knowing it incurs ~2x overhead vs. creating primary and then replicating it. I've recently surmised that it was probably unnecessary default throttling (since we have SSDs on NVME) that made this perform poorly and I'll experiment with adding the replica at the end again with something like"indices.store.throttle.max_bytes_per_sec" : "100mb"

  2. Bulk inserts of 10MB

  3. non-default alert: threadpool.bulk.queue_size: 500 to work around EsRejectedExecutionException; my recent re-reading suggests maybe we should just use less concurrent client-side indexing workers

  4. Currently experimenting with non-default "indices.store.throttle.max_bytes_per_sec" : "100mb" because I noticed significant time in .store.throttle_time_in_millis" which this seemed to all but eliminate ; ran several tests, and 1 in particular had drastically less indexing time and drastically less merges (verified doc counts, etc. but haven't reproduced since!?): this leads me to my key question:

The question: 5. I suspect I can reduce our indexing time significantly by giving more memory to index writers by increasing indices.memory.max_index_buffer_size. Currently default 10% of Xmx13g with at most 5 actively writing shards per node gives: (13g * 10%) / 5 = 260mb

When I look at curl -s localhost:9200/_cluster/settings?pretty while indexing is in progress, the in-progress index has values like

"segments": {
"count": 664,
"memory_in_bytes": 89089968, 85MB
"index_writer_memory_in_bytes": 4215537492, = 3.9GB
"index_writer_max_memory_in_bytes": 7440456595, = 6.9GB
"version_map_memory_in_bytes": 545058976,
"fixed_bit_set_memory_in_bytes": 9759648
},

How do these numbers relate to indices.memory.max_index_buffer_size? For example, is index_writer_memory_in_bytesthe sum of this buffer's current size across all nodes? Could my fast index instance that had very few merges have started with large buffers which meant far fewer total merges and should I expect just increasing max_index_buffer_size (in combination with increased store.throttle.max_bytes_per_sec) would ensure this happens every time?

Any feedback would be welcome, especially on items 4 and 5.

Kind regards,

  • Luke

p.s. This unanswered question is similar: How can I check indices.memory.index_buffer_size parameter is effectively working?


Tiered merge policy settings not documented in 2.x: gone?
(system) #2