While re-indexing data from an index using logstash, at times, it gives the following error:
[2019-06-24T13:45:08,442][INFO ][logstash.pipeline ] Starting pipeline {"id"=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>500, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>2000}
[2019-06-24T13:45:08,514][INFO ][logstash.pipeline ] Pipeline main started
[2019-06-24T13:45:08,561][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2019-06-24T14:37:43,924][ERROR][logstash.pipeline ] A plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::Elasticsearch hosts=>["http://host00.ncc.symantec.com:9200", "http://host01.ncc.symantec.com:9200", "http://host02.ncc.symantec.com:9200", "http://host03.ncc.symantec.com:9200", "http://host04.ncc.symantec.com:9200", "http://host05.ncc.symantec.com:9200", "http://host06.ncc.symantec.com:9200", "http://host07.ncc.symantec.com:9200", "http://host08.ncc.symantec.com:9200", "http://host09.ncc.symantec.com:9200", "http://host10.ncc.symantec.com:9200", "http://host11.ncc.symantec.com:9200", "http://host12.ncc.symantec.com:9200", "http://host13.ncc.symantec.com:9200", "http://host14.ncc.symantec.com:9200", "http://host15.ncc.symantec.com:9200", "http://host16.ncc.symantec.com:9200", "http://host17.ncc.symantec.com:9200", "http://host18.ncc.symantec.com:9200", "http://host19.ncc.symantec.com:9200", "http://host20.ncc.symantec.com:9200"], user=>"elastic", password=>, index=>"indexA-2018-12-07", scroll=>"2m", size=>10000, docinfo_fields=>["_id"], query=>"{"query":\n {"bool":\n {"must_not":\n { "terms": { "Field1":[ \n\t\t "Value1",\n\t\t "Value2",\n\t\t "Value3",\n\t\t .... "ValueN"]}}}}}", id=>"xyz", enable_metric=>true, codec=><LogStash::Codecs::JSON id=>"json_bbacd857-6397-4c70-92a7-52a5c9ffbddf", enable_metric=>true, charset=>"UTF-8">, docinfo=>false, docinfo_target=>"@metadata", ssl=>false>
Error: Timeout::Error
The issue with this is, the plugin actually recovers and again starts re-processing which leads to tens of thousands of deleted docs (duplicates).
I tried increasing the scroll time from "2m" to "10m" but no luck. This error isn't persistent but occurs on and off especially when the documents returned from the query are around 25 Million.
ELK stack version: 5.5.1.
Logstash VM RAM: 6 GB.
Logstash JVM Heap: 2G both Xms and Xmx
Logstash VM nproc: 2 cores
Logstash.yml:
...
xpack.monitoring.elasticsearch.url:
- http://host00.ncc.symantec.com:9200
...
- http://host20.ncc.symantec.com:9200
xpack.monitoring.elasticsearch.username: logstash_system
xpack.monitoring.elasticsearch.password: foo
pipeline.batch.size: 500
pipeline.workers: 4
Logstash.conf:
input {
elasticsearch {
hosts => ["http://host00.ncc.symantec.com:9200", .. "http://host20.ncc.symantec.com:9200"]
user => "elastic"
password => "foo"
index => "indexA-2018-10-27"
scroll => "2m"
size => 10000
docinfo_fields => [ "_id" ]
query => '{"query":
{"bool":
{"must_not":
{ "terms": { "Field":[
"Value1",
"Value2",
...
"ValueN"]}}}}}'
}
}
## TARGET
output {
elasticsearch {
hosts => ["http://host00.ncc.symantec.com:9200", .. "http://host20.ncc.symantec.com:9200"]
document_type => "typeA"
index => "indexA-%{+yyyy-MM-dd}"
document_id => "uniq_id"
user => "elastic"
password => "foo"
pipeline => "foo-pipeline"
}
}
## PROCESS ##
filter {
# Remove fields from FieldA
if [FieldA] =~ /.+/ {
mutate {
gsub => [
....
]
}
}
# Prune email fields using Blacklist
prune {
blacklist_names => [
...
]
}
}
Any pointers to resolve this? This also seems to manifest when I spin up additional logstash VMs to boost reindexing (different logstashes run against different indices).