Logstash skipped half of log file

Hello

I have a daily process that downloads some compressed log files, and then extracts them into a directory logstash is watching

weblogs pipeline;

input {
    file {
         id => "weblogs_input_file"
         path => "/data/logs/web/logs/*.log"
         codec => plain {
             charset => "ISO-8859-1"
         }
     }
}
filters { ... }
output {
    if "_grokparsefailure" in [tags] {
       elasticsearch {
           id => "weblogs_output_skipped_lines"
           index => "skipped-logs-2"
       }
    } else {
        elasticsearch {
            id => "weblogs_output_elastic"
            # default index
        }
    }
}

Today, logstash decided to only ingest the later half of a file (the other 11 processed fine)
It doesn't appear to be a grok failure as they aren't indexed in the skipped-logs-2 index.

The only other thing I can think might be the cause is the new log file somehow matched something already in the File Sincedb? Although I'm not sure how to prove or disprove that..?

The file /var/log/logstash/logstash-plain.log just contains cidr errors, eg something like

[2018-09-05T04:01:44,805][WARN ][logstash.filters.cidr    ] Invalid IP address, skipping {:address=>"%{clientip}".....

There are only 61 of those warnings, I'm missing 1,000s of log lines.

I also made a copy of the file (eg cp file.log newfile.log), and logstash happily reingested the entire file, so I don't think it was anything in particular in this specific file that caused the issue.

Anyone have any suggestions on what might have caused this, and how I can stop it in the future?

Thanks :slight_smile:

The only other thing I can think might be the cause is the new log file somehow matched something already in the File Sincedb? Although I'm not sure how to prove or disprove that..?

This is the likely culprit. I don't think the file input ever removes entries from the sincedb file so after a while there are going to be a lot of entries there and a new file might match an old entry.

Filebeat can clean up its equivalent entries so that this becomes if not impossible then at least way less likely.

Thanks, good to know. When you say 'can clean up', is that some extra config to do that, or does it do that automagically?
I'll have to look to switch over to filebeat -> logstash -> elasticsearch :smiley:

When you say 'can clean up', is that some extra config to do that, or does it do that automagically?

I don't remember. I believe it's configurable so check the Filebeat documentation.

Ah yes, found it.
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-clean-options

There's even an FAQ!
https://www.elastic.co/guide/en/beats/filebeat/current/faq.html#inode-reuse-issue

Thanks for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.