File input randomly skipping files

I'm using the file input to ingest cloudflare logs into elasticsearch. I have a cronjob that runs every 10 minutes and dumps a gzip file ranging from a few hundred KB to a couple MB in size into the read path. Logstash randomly skips files for reasons yet unknown. I don't see any of the skipped filenames in debug logs. I do see an occasional plugin unrecoverable error that restarts the pipeline. My logstash config is as follows:

input {
    file {
       path => "/var/spool/logstash/cloudflare*.gz"
       sincedb_path => "/var/spool/logstash/cloudflare_logs.sdb"
       sincedb_clean_after => "30m"
       mode => "read"
       file_completed_action => "log_and_delete"
       file_completed_log_path => "/var/log/logstash/cloudflare_processed.log"
       #file_sort_by => "path"
       tags => ["cloudflare"]
    }
}

filter {
    if "cloudflare" in [tags] {
        json {
            source => "message"
        }

        date {
            match => ["EdgeStartTimestamp", "ISO8601"]
            target => "@timestamp"
        }

        geoip {
          source => "ClientIP"
        }
    }
}

output {
    if "cloudflare" in [tags] {
        elasticsearch {
          hosts => ["obc-es1:9200", "obc-es2:9200", "obc-es3:9200"]
          index => "cloudflare-%{+YYYY.MM.dd}"
        }
    }
}

I don't know if this is directly related, but here is the error log for the plugin restart

[2019-02-27T12:00:08,240][ERROR][logstash.pipeline        ] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:cloudflare
  Plugin: <LogStash::Inputs::File mode=>"read", path=>["/var/spool/logstash/cloudflare*.gz"], sincedb_clean_after=>1800.0, id=>"8a5183f2744bc4f0a6a0cbf26a752d34dbb713
ac46aaa6d632a8d13412feaad4", sincedb_path=>"/var/spool/logstash/cloudflare_logs.sdb", file_completed_action=>"log_and_delete", file_completed_log_path=>"/var/log/logstash/cloudflare_processed.log", tags=>["cloudflare"], enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_d0768b8e-7302-4610-9c23-2523583bdc9c", enable_metric=>true, charset=>"UTF-8">, stat_interval=>1.0, discover_interval=>15, sincedb_write_interval=>15.0, start_position=>"end", delimiter=>"\n", close_older=>3600.0, file_chunk_size=>32768, file_chunk_count=>140737488355327, file_sort_by=>"last_modified", file_sort_direction=>"asc">
  Error:
  Exception: Java::JavaIo::EOFException
  Stack: java.util.zip.GZIPInputStream.readUByte(java/util/zip/GZIPInputStream.java:268)
java.util.zip.GZIPInputStream.readUShort(java/util/zip/GZIPInputStream.java:258)
java.util.zip.GZIPInputStream.readHeader(java/util/zip/GZIPInputStream.java:164)
java.util.zip.GZIPInputStream.<init>(java/util/zip/GZIPInputStream.java:79)
java.util.zip.GZIPInputStream.<init>(java/util/zip/GZIPInputStream.java:91)
java.lang.reflect.Constructor.newInstance(java/lang/reflect/Constructor.java:423)
org.jruby.javasupport.JavaConstructor.newInstanceDirect(org/jruby/javasupport/JavaConstructor.java:278)
org.jruby.RubyClass.newInstance(org/jruby/RubyClass.java:1001)
org.jruby.RubyClass$INVOKER$i$newInstance.call(org/jruby/RubyClass$INVOKER$i$newInstance.gen)
usr.share.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_1_dot_9.lib.filewatch.read_mode.handlers.read_zip_file.handle_specifically(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.1.9/lib/filewatch/read_mode/handlers/read_zip_file.rb:23)
usr.share.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_1_dot_9.lib.filewatch.read_mode.handlers.base.handle(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.1.9/lib/filewatch/read_mode/handlers/base.rb:26)
usr.share.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_1_dot_9.lib.filewatch.read_mode.processor.read_zip_file(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.1.9/lib/filewatch/read_mode/processor.rb:39)
usr.share.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_1_dot_9.lib.filewatch.read_mode.processor.block in process_active(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.1.9/lib/filewatch/read_mode/processor.rb:102)
org.jruby.RubyArray.each(org/jruby/RubyArray.java:1734)
usr.share.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_1_dot_9.lib.filewatch.read_mode.processor.process_active(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.1.9/lib/filewatch/read_mode/processor.rb:88)
usr.share.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_1_dot_9.lib.filewatch.read_mode.processor.process_all_states(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.1.9/lib/filewatch/read_mode/processor.rb:45)
usr.share.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_1_dot_9.lib.filewatch.watch.iterate_on_state(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.1.9/lib/filewatch/watch.rb:67)
usr.share.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_1_dot_9.lib.filewatch.watch.subscribe(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.1.9/lib/filewatch/watch.rb:45)
usr.share.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_1_dot_9.lib.filewatch.observing_read.subscribe(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.1.9/lib/filewatch/observing_read.rb:12)
usr.share.logstash.vendor.bundle.jruby.$2_dot_3_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_1_dot_9.lib.logstash.inputs.file.run(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-file-4.1.9/lib/logstash/inputs/file.rb:339)
usr.share.logstash.logstash_minus_core.lib.logstash.pipeline.inputworker(/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:426)
usr.share.logstash.logstash_minus_core.lib.logstash.pipeline.RUBY$method$inputworker$0$__VARARGS__(usr/share/logstash/logstash_minus_core/lib/logstash//usr/share/logstash/logstash-core/lib/logstash/pipeline.rb)
usr.share.logstash.logstash_minus_core.lib.logstash.pipeline.block in start_input(/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:420)
org.jruby.RubyProc.call(org/jruby/RubyProc.java:289)
org.jruby.RubyProc.call(org/jruby/RubyProc.java:246)
java.lang.Thread.run(java/lang/Thread.java:748)
[2019-02-27T12:00:09,241][INFO ][filewatch.observingread  ] QUIT - closing all files and shutting down.
[2019-02-27T12:00:09,242][INFO ][filewatch.observingread  ] START, creating Discoverer, Watch with file and sincedb collections

You need trace level logging to for the file input to log all the details of the files it is watching and when it thinks they change.

I enabled TRACE debugging on logstash.inputs.file and let it run overnight. There were several skipped files left in the read directory this morning and nothing in the logs for TRACE or with the filenames that were skipped. In order to process the skipped files, I have to stop logstash, clear sincedb file and then restart logstash. I'm starting to suspect the issue is with sincedb, as I have a skipped file in the read folder and a line in sincedb for it, e.g.

262298 0 64514 2049026 1551370742.518811 /var/spool/logstash/cloudflare.log-2019-02-28_10-00-01.gz

I previously tried setting sincedb_path to /dev/null, but I was still experiencing the skipped files.

This makes no sense to me and I'm at a complete loss. I cannot induce a failure, I just have to wait for it to happen randomly. I'm also experiencing the same behaviour on a bitnami appliance that we fielded for evaluation purposes.

The file input assumes that the combination of filename and inode is unique. Sometimes that is not a good assumption. If I set up a file input monitoring /tmp/foo.txt and then

rm /tmp/foo.txt ; echo bar > /tmp/foo.txt

then logstash will process "bar". However, if I then

rm /tmp/foo.txt ; echo baz > /tmp/foo.txt

then a file input will not notice, because the new file has the same name, inode, and size.

Setting sincedb_path to /dev/null only affects where the in-memory sincedb is persisted to across runs.

I prefer filesystems that use inode generation numbers, so that this does not happen, but I do not get to choose the filesystem I run on.

I'm using read mode and log_and_delete for file_completed_action. Each new file is unique, as it contains a datetime in the filename, e.g. cloudflare.log-2019-02-28_10-00-01.gz. There will never be a duplicate filename.

I finally got trace logging for the issue, which appears to be in filewatch:

[2019-02-28T14:10:01,352][TRACE][filewatch.discoverer     ] discover_files {"count"=>1}
[2019-02-28T14:10:01,355][TRACE][filewatch.discoverer     ] discover_files handling: {"new discovery"=>true, "watched_file details"=>"<FileWatch::WatchedFile: @filename='cloudflare.log-2019-02-28_14-00-01.gz', @state='watched', @recent_states='[:watched]', @bytes_read='0', @bytes_unread='0', current_size='0', last_stat_size='0', file_open?='false', @initial=true, @sincedb_key='262258 0 64514'>"}
[2019-02-28T14:10:01,355][TRACE][filewatch.sincedbcollection] associate: finding {"inode"=>"262258", "path"=>"/var/spool/logstash/cloudflare.log-2019-02-28_14-00-01.gz"}
[2019-02-28T14:10:01,356][TRACE][filewatch.sincedbcollection] associate: found sincedb record {"filename"=>"cloudflare.log-2019-02-28_14-00-01.gz", "sincedb key"=>#<struct FileWatch::InodeStruct inode="262258", maj=0, min=64514>, "sincedb_value"=>#<FileWatch::SincedbValue:0x15484e4d @last_changed_at=1551384046.892876, @watched_file=nil, @position=2019677>}
[2019-02-28T14:10:01,381][TRACE][filewatch.sincedbcollection] handle_association fully read, ignoring..... {"watched file"=>"<FileWatch::WatchedFile: @filename='cloudflare.log-2019-02-28_14-00-01.gz', @state='ignored', @recent_states='[:watched, :watched]', @bytes_read='2019677', @bytes_unread='0', current_size='0', last_stat_size='0', file_open?='false', @initial=false, @sincedb_key='262258 0 64514'>", "sincedb value"=>#<FileWatch::SincedbValue:0x15484e4d @last_changed_at=1551384601.357625, @watched_file="<FileWatch::WatchedFile: @filename='cloudflare.log-2019-02-28_14-00-01.gz', @state='ignored', @sincedb_key='262258 0 64514, size=0>", @position=2019677>}
[2019-02-28T14:10:01,396][TRACE][filewatch.sincedbcollection] associate: inode matched but no path in sincedb
[2019-02-28T14:10:16,421][TRACE][filewatch.discoverer     ] discover_files {"count"=>1}
[2019-02-28T14:10:16,429][TRACE][filewatch.discoverer     ] discover_files handling: {"new discovery"=>false, "watched_file details"=>"<FileWatch::WatchedFile: @filename='cloudflare.log-2019-02-28_14-00-01.gz', @state='ignored', @recent_states='[:watched, :watched]', @bytes_read='2019677', @bytes_unread='0', current_size='0', last_stat_size='0', file_open?='false', @initial=false, @sincedb_key='262258 0 64514'>"}

So it finds an entry in the sincedb for the inode, which indicates the file input has already read 2019677 bytes from that inode. That's bigger than the current size so it does not read anything.

The sincedb entry has no path associated with it. I do not understand how that can happen. Perhaps @guyboertje can shed some light on that.

Assuming file sizes are random, this should mean about half the time the new file for that inode will be shorter than the previous use of the inode. That will result in the file being skipped. And half the time the new file will be larger, and the GZIPInputStream class will start reading the gzipped file close to the end, and I am not surprised that that would raise an exception. So I would expect pipeline crashes to be about as common as skipped files.

That makes sense. I'm creating files every 10 minutes, which are being deleted after they are processed, and occasionally a new file is getting the same inode as a previously processed file. I suppose my issue could probably be solved by changing file_completed_action to "log" and clean up the processed files with another process, like a daily cron to remove files older than 24 or 48 hours.

I thought that sincedb_clean_after => "30m" would have handled my situation, but I guess not.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.