Logstash Input ElasticSearch "End of file" error -- Cannot reindex

Hi,

I'm quite new to ELK and I'm trying to re-index logs with Logstash. I'm using the ES input plugin to read from an existing index, do some transformations, then write to a new index with the ES output plugin.

Versions in use are:

  • ES 2.3.1
  • Logstash 2.3.1
  • Logstash ES input plugin 2.0.5
  • Logstash ES output plugin 2.5.5

Basically, here my pipeline (I also tried the default size and scroll):

input {
  elasticsearch {
    hosts => [ "localhost:9200" ]
    index => "logstash-*"
    docinfo => true
    size => 500
    scroll => "5m"
 
    query => '{
      "query" : {
        "range" : {
          "@timestamp" : {
            "gte" : "2016-05-01T00:00:00",
            "lt"  : "2016-05-01T00:00:00||+1d"
          }
        }
      }
    }'
  }
}
 
filter {  ... }

output {
  elasticsearch {
    hosts => [ "localhost:9200" ]
    index => "another-index-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][_type]}"
    document_id => "%{[@metadata][_id]}"
  }
}

I have for example 700k logs in my source index. The pipeline has a lot of errors saying:

A plugin had an unrecoverable error. Will restart this plugin.
  Plugin: <LogStash::Inputs::Elasticsearch hosts=>["localhost:9200"], index=>"logstash-*", docinfo=>true,  size=>500, scroll=>"5m", query=>"{\n      \"query\" :  {\n        \"range\" : {\n          \"@timestamp\" : {\n \"gte\" : \"2016-05-01T00:00:00\",\n            \"lt\"  : \"2016-05-01T00:00:00||+1d\"\n }\n        }\n      }\n    }", codec=><LogStash::Codecs::JSON charset=>"UTF-8">, scan=>true, docinfo_target=>"@metadata", docinfo_fields=>["_index", "_type", "_id"], ssl=>false> Error: End of file reached {:level=>:error}

The debug mode gives me this (edited to fit under 5000 chars):

Exception: Faraday::ConnectionFailed
  Stack: org/jruby/RubyIO.java:2860:in `read_nonblock'
.../net/protocol.rb:141:in `rbuf_fill'
.../net/protocol.rb:122:in `readuntil'
.../net/protocol.rb:132:in `readline'
.../net/http.rb:2571:in `read_status_line'
.../net/http.rb:2560:in `read_new'
.../net/http.rb:1328:in `transport_request'
org/jruby/RubyKernel.java:1242:in `catch'
.../net/http.rb:1325:in `transport_request'
.../net/http.rb:1302:in `request'
.../net/http.rb:1295:in `request'
.../net/http.rb:746:in `start'
.../net/http.rb:1293:in `request'
.../faraday-0.9.2/lib/faraday/adapter/net_http.rb:82:in `perform_request'
.../faraday-0.9.2/lib/faraday/adapter/net_http.rb:40:in `call'
.../faraday-0.9.2/lib/faraday/adapter/net_http.rb:87:in `with_net_http_connection'
.../faraday-0.9.2/lib/faraday/adapter/net_http.rb:32:in `call'
.../faraday-0.9.2/lib/faraday/rack_builder.rb:139:in `build_response'
.../faraday-0.9.2/lib/faraday/connection.rb:377:in `run_request'
.../elasticsearch-transport-1.0.15/lib/elasticsearch/transport/transport/http/faraday.rb:24:in `perform_request'...", :level=>:error, :file=>"logstash/pipeline.rb", 
:line=>"353", :method=>"inputworker"}

We see below that there is a lot of deleted documents (a line every 15 seconds):

count  deleted  store.size
...
 80000   39705   49.5mb
130000   48045   76.4mb
130000   48045     81mb
130000   48045   82.6mb
130000   48045   84.3mb
130000  120741  107.7mb
130000  120741  109.5mb
130000  120741    112mb
130000  230295  114.2mb
130000  230295  140.1mb
130000  230295  142.4mb
130000  230295  145.1mb
130000  345295  147.5mb
130000  345295  172.4mb
130000  345295  174.6mb
130000  345295  176.3mb
130000  244491  149.7mb
130000  244491  152.1mb
130000  244491  153.9mb
130000  244491  155.5mb
130000  121721  104.7mb
130000  121721  106.8mb
130000  121721  109.4mb
130000  121721  113.4mb
130000  131250  102.8mb
130000  131250  105.3mb
130000  131250  109.5mb
130000  132581  111.3mb
130000  132541  107.9mb
130000  132541  111.8mb
130000  132541  114.9mb
...

In the end, I'm not sure, but the pipeline seems to be stuck in a loop and no more logs get indexed (the deleted documents keep growing). The CPU is still in quite heavy use but the new index won't have more than a certain number of logs, e.g. 250k (which is not always the same for a given configuration).

And if I don't keep the document ID in the output, the index keeps growing indefinitely.

Thanks for your help!

I am actually receiving this exact same error, trying to do pretty much the same thing.

Did you ever figure this out?

Sorry but I didn't. I ended up reindexing all my documents from raw...