I use this on 1.1.0 in my config/elasticsearch.yml
index:
merge:
scheduler:
type: concurrent
max_thread_count: 4
policy:
type: tiered
max_merged_segment: 1gb
segments_per_tier: 4
max_merge_at_once: 4
max_merge_at_once_explicit: 4
threadpool:
merge:
type: fixed
size: 4
queue_size: 32
Explanation:
- use concurrent scheduler and limit it to 4 threads. I find 4 threads
being able to keep up with the highest bulk insertion rate I could generate
- use tiered policy (the default, it is most flexible in selecting segments
to merge)
- create segments less than 1gb in a tier (this limits the file size of the
segments files, the smaller the files, the faster the merges, but the more
files are created)
- create 4 segments per tier (do not create segments numbers that are too
high per tier)
- merge 4 segments at each merge step (this limits the total run time and
resource consumption of a segment merge step)
- also limit merge for explicit _optimize API call
- extend thread pool to 4 merge threads with a maximum of 32 merge
operations in the queue (32 should be sufficient to handle outstanding
merges)
From time to time, if the number of files get very high (>500) and index is
calm (no indexing, no heavy search), I do a manual _optimize.
Jörg
On Fri, Apr 18, 2014 at 9:01 PM, David Smith davidksmith2k@gmail.comwrote:
I see that ES switch back to ConcurrentMergeScheduler in 1.1.1 due to it
affecting indexing performance in 1.1.0.
Switch back to ConcurrentMergeScheduler as the default · Issue #5817 · elastic/elasticsearch · GitHub
We're on 1.1.0 and cannot upgrade to 1.1.1 for the time being. Is there a
way to switch it back using the API? I tried the following command, but it
seems to not take.
curl -i -XPUT localhost:9200/_cluster/settings -d '{ "persistent": {
"index.merge.scheduler.type":
"org.elasticsearch.index.merge.scheduler.ConcurrentMergeSchedulerProvider"
} }'
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 52
{"acknowledged":true,"persistent":{},"transient":{}}
It does not seem to be set when I try to re-GET it (and no errors in logs
at DEBUG level or above).
curl -i -XGET localhost:9200/_cluster/settings
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 66
{"persistent":{"threadpool":{"bulk":{"size":"8"}}},"transient":{}}
Am using the wrong way of specifying the scheduler? I also tried just
specifying ConcurrentMergeSchedulerProvider instead of the full class
name, but that didn't work.
Any ideas?
David
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/601a831d-2c8e-4615-b816-435a6d4e4d9c%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/601a831d-2c8e-4615-b816-435a6d4e4d9c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGwnPYyBPYRSPz5c9WGzfH68CHX7gXb7UwmgMbwXdOnMg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.