Logstash skipped half of log file

cosmo · September 5, 2018, 2:07pm

Hello

I have a daily process that downloads some compressed log files, and then extracts them into a directory logstash is watching

weblogs pipeline;

input {
    file {
         id => "weblogs_input_file"
         path => "/data/logs/web/logs/*.log"
         codec => plain {
             charset => "ISO-8859-1"
         }
     }
}
filters { ... }
output {
    if "_grokparsefailure" in [tags] {
       elasticsearch {
           id => "weblogs_output_skipped_lines"
           index => "skipped-logs-2"
       }
    } else {
        elasticsearch {
            id => "weblogs_output_elastic"
            # default index
        }
    }
}

Today, logstash decided to only ingest the later half of a file (the other 11 processed fine)
It doesn't appear to be a grok failure as they aren't indexed in the skipped-logs-2 index.

The only other thing I can think might be the cause is the new log file somehow matched something already in the File Sincedb? Although I'm not sure how to prove or disprove that..?

The file /var/log/logstash/logstash-plain.log just contains cidr errors, eg something like

[2018-09-05T04:01:44,805][WARN ][logstash.filters.cidr    ] Invalid IP address, skipping {:address=>"%{clientip}".....

There are only 61 of those warnings, I'm missing 1,000s of log lines.

I also made a copy of the file (eg cp file.log newfile.log), and logstash happily reingested the entire file, so I don't think it was anything in particular in this specific file that caused the issue.

Anyone have any suggestions on what might have caused this, and how I can stop it in the future?

Thanks

magnusbaeck · September 6, 2018, 7:37am

The only other thing I can think might be the cause is the new log file somehow matched something already in the File Sincedb? Although I'm not sure how to prove or disprove that..?

This is the likely culprit. I don't think the file input ever removes entries from the sincedb file so after a while there are going to be a lot of entries there and a new file might match an old entry.

Filebeat can clean up its equivalent entries so that this becomes if not impossible then at least way less likely.

cosmo · September 6, 2018, 8:54am

Thanks, good to know. When you say 'can clean up', is that some extra config to do that, or does it do that automagically?
I'll have to look to switch over to filebeat -> logstash -> elasticsearch

magnusbaeck · September 6, 2018, 10:12am

When you say 'can clean up', is that some extra config to do that, or does it do that automagically?

I don't remember. I believe it's configurable so check the Filebeat documentation.

cosmo · September 6, 2018, 10:31am

Ah yes, found it.
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-clean-options

There's even an FAQ!
https://www.elastic.co/guide/en/beats/filebeat/current/faq.html#inode-reuse-issue

Thanks for your help!

system · October 4, 2018, 10:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
POTENTIAL BUG? LogStash drops first few events when monitoring multiple files Elasticsearch	2	346	July 6, 2017
Logstash randomly skipping files Logstash	4	890	February 24, 2021
Logstash pipeline is starting but not executing Logstash	7	3806	July 6, 2017
Problem with sincedb Logstash	5	1410	May 30, 2017
File input not working - logstash 2.2.2 Logstash	5	1365	July 6, 2017

Logstash skipped half of log file

Related topics