Logstash crashes, restarts and then it just duplicates the data over and over

I have set up logstash to index data from AWS ELB logs. I download the logs to the machine every minute and then I index the logs using logstash.

Now I believe since the log files are in .gz that there is a corrupt gzip archive somewhere in the file hierarchy and that when this archive its reached it crashes logstash which restarts and begins from the beginning, then I have duplicate of every event, then I have 3 of every event and so on forever until the disk is full. Furthermore it seem like logstash does not continue after the point where it crashed.

Question is how to fix this? I would have no problem with ignoring a corrupt gz-file however if this restarts logstash and I get a duplicate of every event and also that it does not run past this file, then I have a great problem I need to fix.

Latest thing I tried was adding check_archive_validity which I was hoping would make it not crash when it reached the corrupt file.

Any ideas or suggestions appreciated

Here is my config from /etc/logstash/conf.d/elb.conf

input {
    file {
        path => "/home/deadlock/aws/**/*.gz"
        type => "elb"
        mode => "read"
        file_completed_action => "log"
        file_completed_log_path => "/home/deadlock/deletelog"
        sincedb_path => "/home/deadlock/sincedb/aws"
        close_older => "1 hour"
        max_open_files => 40000
        file_sort_direction => "desc"
        check_archive_validity => true
    }
}

filter {
    if [type] == "elb" {
        grok {
            match => [ "message", "%{TIMESTAMP_ISO8601:timestamp} **<I removed these details>**
        }
        grok {
            match => [ "request", "\"%{NOTSPACE:http_verb} %{URIPROTO:http_proto}://(?:%{USER:http_user}(?::[^@]*)?@)?(?:%{URIHOST:http_host})?(?:%{URIPATHPARAM:http_path})? %{NOTSPACE:http_version}\"" ]
        }
        date {
            match => [ "timestamp", "ISO8601" ]
        }
        grok {
            add_tag => [ "haslocation" ]
            match => [ "message", "long=%{NUMBER:longitude:float}&lat=%{NUMBER:latitude:float}" ]
        }
        if "haslocation" in [tags] {
            mutate {
                add_field => { "[geoip][location][lat]" => "%{latitude}" }
                add_field => { "[geoip][location][lon]" => "%{longitude}" }
            }
            mutate {
                convert => {"[geoip][location][lat]" => "float"}
                convert => {"[geoip][location][lon]" => "float"}
            }
        }
    }
}
output {
  elasticsearch { hosts => ["localhost:9200"] }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.