I want to use logstash to pull current elasticsearch documents, run them through filters, and then update them in the same cluster. I am trying to figure out how to get the best performance, and I'm first looking at the elasticsearch plugin. I know that pipeline workers only apply to the filter and output plugins.
I have 3 ES nodes all on the same subnet. I created a new Ubuntu 14.04 VM in the same subnet and installed LS 5.0-alpha5 from the debian package. The VM has 8GB of RAM and 4 cores.
Taking a tip from a recent elastic blog post, I decided to setup a logstash config file like the following:
input {
elasticsearch {
hosts => ["hostA","hostB","hostC"]
index => "specific-index"
docinfo => true
query => '
{
"query": {
"match": {
"_type": {
"query": "MyType"
}
}
}
}
'
}
}
output {
stdout { codec => dots }
}
Then I run the following command:
sudo ./logstash -f /etc/logstash/conf.d/logstash.conf --path.settings=/etc/logstash | pv -Wart > /dev/null
When I run this I get around 3 - 4 kB/s tops.
I'm wondering what I can do to improve the throughput of this logstash script / elasticsearch input plugin. Are there any settings that could improve this? Anything on the elasticsearch nodes that need to change?