Hi,
I'm quite new to ELK and I'm trying to re-index logs with Logstash. I'm using the ES input plugin to read from an existing index, do some transformations, then write to a new index with the ES output plugin.
Versions in use are:
- ES 2.3.1
- Logstash 2.3.1
- Logstash ES input plugin 2.0.5
- Logstash ES output plugin 2.5.5
Basically, here my pipeline (I also tried the default size and scroll):
input {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "logstash-*"
docinfo => true
size => 500
scroll => "5m"
query => '{
"query" : {
"range" : {
"@timestamp" : {
"gte" : "2016-05-01T00:00:00",
"lt" : "2016-05-01T00:00:00||+1d"
}
}
}
}'
}
}
filter { ... }
output {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "another-index-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][_type]}"
document_id => "%{[@metadata][_id]}"
}
}
I have for example 700k logs in my source index. The pipeline has a lot of errors saying:
A plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::Elasticsearch hosts=>["localhost:9200"], index=>"logstash-*", docinfo=>true, size=>500, scroll=>"5m", query=>"{\n \"query\" : {\n \"range\" : {\n \"@timestamp\" : {\n \"gte\" : \"2016-05-01T00:00:00\",\n \"lt\" : \"2016-05-01T00:00:00||+1d\"\n }\n }\n }\n }", codec=><LogStash::Codecs::JSON charset=>"UTF-8">, scan=>true, docinfo_target=>"@metadata", docinfo_fields=>["_index", "_type", "_id"], ssl=>false> Error: End of file reached {:level=>:error}
The debug mode gives me this (edited to fit under 5000 chars):
Exception: Faraday::ConnectionFailed
Stack: org/jruby/RubyIO.java:2860:in `read_nonblock'
.../net/protocol.rb:141:in `rbuf_fill'
.../net/protocol.rb:122:in `readuntil'
.../net/protocol.rb:132:in `readline'
.../net/http.rb:2571:in `read_status_line'
.../net/http.rb:2560:in `read_new'
.../net/http.rb:1328:in `transport_request'
org/jruby/RubyKernel.java:1242:in `catch'
.../net/http.rb:1325:in `transport_request'
.../net/http.rb:1302:in `request'
.../net/http.rb:1295:in `request'
.../net/http.rb:746:in `start'
.../net/http.rb:1293:in `request'
.../faraday-0.9.2/lib/faraday/adapter/net_http.rb:82:in `perform_request'
.../faraday-0.9.2/lib/faraday/adapter/net_http.rb:40:in `call'
.../faraday-0.9.2/lib/faraday/adapter/net_http.rb:87:in `with_net_http_connection'
.../faraday-0.9.2/lib/faraday/adapter/net_http.rb:32:in `call'
.../faraday-0.9.2/lib/faraday/rack_builder.rb:139:in `build_response'
.../faraday-0.9.2/lib/faraday/connection.rb:377:in `run_request'
.../elasticsearch-transport-1.0.15/lib/elasticsearch/transport/transport/http/faraday.rb:24:in `perform_request'...", :level=>:error, :file=>"logstash/pipeline.rb",
:line=>"353", :method=>"inputworker"}
We see below that there is a lot of deleted documents (a line every 15 seconds):
count deleted store.size
...
80000 39705 49.5mb
130000 48045 76.4mb
130000 48045 81mb
130000 48045 82.6mb
130000 48045 84.3mb
130000 120741 107.7mb
130000 120741 109.5mb
130000 120741 112mb
130000 230295 114.2mb
130000 230295 140.1mb
130000 230295 142.4mb
130000 230295 145.1mb
130000 345295 147.5mb
130000 345295 172.4mb
130000 345295 174.6mb
130000 345295 176.3mb
130000 244491 149.7mb
130000 244491 152.1mb
130000 244491 153.9mb
130000 244491 155.5mb
130000 121721 104.7mb
130000 121721 106.8mb
130000 121721 109.4mb
130000 121721 113.4mb
130000 131250 102.8mb
130000 131250 105.3mb
130000 131250 109.5mb
130000 132581 111.3mb
130000 132541 107.9mb
130000 132541 111.8mb
130000 132541 114.9mb
...
In the end, I'm not sure, but the pipeline seems to be stuck in a loop and no more logs get indexed (the deleted documents keep growing). The CPU is still in quite heavy use but the new index won't have more than a certain number of logs, e.g. 250k (which is not always the same for a given configuration).
And if I don't keep the document ID in the output, the index keeps growing indefinitely.
Thanks for your help!