Logstash elasticsearch input plugin error

While re-indexing data from an index using logstash, at times, it gives the following error:

[2019-06-24T13:45:08,442][INFO ][logstash.pipeline ] Starting pipeline {"id"=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>500, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>2000}
[2019-06-24T13:45:08,514][INFO ][logstash.pipeline ] Pipeline main started
[2019-06-24T13:45:08,561][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2019-06-24T14:37:43,924][ERROR][logstash.pipeline ] A plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::Elasticsearch hosts=>["http://host00.ncc.symantec.com:9200", "http://host01.ncc.symantec.com:9200", "http://host02.ncc.symantec.com:9200", "http://host03.ncc.symantec.com:9200", "http://host04.ncc.symantec.com:9200", "http://host05.ncc.symantec.com:9200", "http://host06.ncc.symantec.com:9200", "http://host07.ncc.symantec.com:9200", "http://host08.ncc.symantec.com:9200", "http://host09.ncc.symantec.com:9200", "http://host10.ncc.symantec.com:9200", "http://host11.ncc.symantec.com:9200", "http://host12.ncc.symantec.com:9200", "http://host13.ncc.symantec.com:9200", "http://host14.ncc.symantec.com:9200", "http://host15.ncc.symantec.com:9200", "http://host16.ncc.symantec.com:9200", "http://host17.ncc.symantec.com:9200", "http://host18.ncc.symantec.com:9200", "http://host19.ncc.symantec.com:9200", "http://host20.ncc.symantec.com:9200"], user=>"elastic", password=>, index=>"indexA-2018-12-07", scroll=>"2m", size=>10000, docinfo_fields=>["_id"], query=>"{"query":\n {"bool":\n {"must_not":\n { "terms": { "Field1":[ \n\t\t "Value1",\n\t\t "Value2",\n\t\t "Value3",\n\t\t .... "ValueN"]}}}}}", id=>"xyz", enable_metric=>true, codec=><LogStash::Codecs::JSON id=>"json_bbacd857-6397-4c70-92a7-52a5c9ffbddf", enable_metric=>true, charset=>"UTF-8">, docinfo=>false, docinfo_target=>"@metadata", ssl=>false>
Error: Timeout::Error

The issue with this is, the plugin actually recovers and again starts re-processing which leads to tens of thousands of deleted docs (duplicates).

I tried increasing the scroll time from "2m" to "10m" but no luck. This error isn't persistent but occurs on and off especially when the documents returned from the query are around 25 Million.

ELK stack version: 5.5.1.
Logstash VM RAM: 6 GB.
Logstash JVM Heap: 2G both Xms and Xmx
Logstash VM nproc: 2 cores

Logstash.yml:

   ...
    xpack.monitoring.elasticsearch.url:
    - http://host00.ncc.symantec.com:9200
    ...
    - http://host20.ncc.symantec.com:9200
    xpack.monitoring.elasticsearch.username: logstash_system
    xpack.monitoring.elasticsearch.password: foo
    pipeline.batch.size: 500
    pipeline.workers: 4

Logstash.conf:

input {
  elasticsearch {
    hosts => ["http://host00.ncc.symantec.com:9200", .. "http://host20.ncc.symantec.com:9200"]
    user => "elastic"
    password => "foo"
    index => "indexA-2018-10-27"
    scroll => "2m"
    size => 10000
    docinfo_fields => [ "_id" ]
    query => '{"query":
               {"bool":
                {"must_not":
                  { "terms": { "Field":[ 
		  "Value1",
		  "Value2",
		  ...
		  "ValueN"]}}}}}'
  }
}

## TARGET
output {
  elasticsearch {
    hosts => ["http://host00.ncc.symantec.com:9200", .. "http://host20.ncc.symantec.com:9200"]
    document_type => "typeA"
    index => "indexA-%{+yyyy-MM-dd}"
    document_id => "uniq_id"
    user => "elastic"
    password => "foo"
    pipeline => "foo-pipeline"
  }
}

## PROCESS ##
filter {
  # Remove fields from FieldA
  if [FieldA] =~ /.+/ {
    mutate {
      gsub => [ 
	  ....
      ]
    }
  }
      
  # Prune email fields using Blacklist
  prune {
    blacklist_names => [
	...
    ]
  }
}

Any pointers to resolve this? This also seems to manifest when I spin up additional logstash VMs to boost reindexing (different logstashes run against different indices).

@Christian_Dahlqvist any thoughts on this?

I would use the reindex API with an ingest pipeline instead, but have no input on this issue as I have not used the Logstash Elasticsearch input in a long time.

We found logstash to be better performer vis-a-vis ingest pipelines when it came to heavy mutation.

We believe we have been able to solve this with the following (This is after monitoring for 30+ hours. We are still keeping our fingers crossed):

This is what we did:

1. Increased the scroll time to "20m"
2. Decreased "scroll size" to 5000
3. Added "action ==> create" in logstash es output
4. Added "transient" : { "indices.store.throttle.type" : "none" }
5. Added "transient" : { "indices.store.throttle.max_bytes_per_sec" : "1gb" }
6. Bumped logstash JVM heap to 3g (VM RAM is 6G)

This post helped us immensely - https://thoughts.t37.net/how-we-reindexed-36-billions-documents-in-5-days-within-the-same-elasticsearch-cluster-cd9c054d1db8

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.