Hi Team,
I have setup a cluster which indexes ~160GB of data per day into elasticsearch. I am currently facing this case where I need to update almost all the docs in all indices with a small amount of data(~16GB) per index which is of the format
id1,data1
id1,data2
id2,data1
id2,data2
id2,data3
.
.
.
My update operations start happening at 16000 lines per second and in 5 minutes it comes down to 1000 lines per second and doesnt go up after that. The way it stands now, the time for this update to happen is longer than my entire indexing process for 1 day
My conf file for the update operation currently looks as follows
output
{
elasticsearch {
action => "update"
doc_as_upsert => true
hosts => ["host1","host2","host3","host4"]
index => "logstash-2017-08-1"
document_id => "%{uniqueid}"
document_type => "daily"
retry_on_conflict => 2
flush_size => 1000
}
}
The optimizations I have done to speed up indexing in my cluster based on the suggestions here https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html are
Setting "indices.store.throttle.type" : "none"
Index "refresh_interval" : "-1"
I am running my cluster on 4 instances of the d2.8xlarge EC2 instances. I have allocated 30GB heap to each node. While the update is happening node cpu is barely used and the load is very less as well.
Is there something very obvious that I am missing that is causing this issue? While looking at the threadpool data I find that the number of threads working on bulk operations are constantly high.
Any help on this issue would be really helpful! Please let me know if you would need more info
Thanks in advance
Vignesh