Logstash data Loss


#1

Hi!
I have noticed I'm loosing data in logstash, when I send 500 packages from my application, logstash only send 300 more or less.
I have included a counter in the packages sent and I have realized I'm loosing data after 300 UDP packages received. What should I do to solve it? I have though about changing SizedQueue in Logstash or including directly 2 logstash instances with a REDIS buffer.
What is the optimal solution for not losing data?

I have included an output file in logstash configuration in order to count how many packages I'm processing in logstash and the output elasticsearh. I'm receiving more data in elasticsearch than in the output file. How could it be possible? Maybe could be a output file delay problem.
What do you think?

Thank you very much


(Christian Dahlqvist) #2

What does your Logstash configuration look like?


#3
input
 {
    udp {
        type => "udp"
        port => 30000
    }

   gelf {
        type =>"log4j"
        port =>4560
        host =>"0.0.0.0"
        }
 }

filter {
   if [type] == "udp" {
            mutate {
                 rename => ["@host", "host"]
            }
           dns {
                 reverse => ["host"]
                 action =>  "replace"
                  nameserver => "IPSERVERDNS"
            }


           grok {
                 patterns_dir => "/etc/logstash/patterns"

                 match => [
                         "message","%{MESSAGE_1}",
                         "message", "%{MESSAGE_2}",
                         "message", "%{MESSAGE_3}",
                         "message", "%{MESSAGE_4}"

                        ]
                }


        if[app_name]=="old_V3_logs.vi"{
                                date{
                                        match => [ "dater", "YYYY/MM/dd HH:mm:ss.SSS" ]
                                        target => "@timestamp"
                                        }
                        }
        grok {

               patterns_dir => "/etc/logstash/patterns"
                 match => [
                         "Comment","%{COMMENT_MESSAGE2}",
                         "Comment","%{COMMENT_MESSAGE}",
                         "Comment","%{GREEDYDATA}"
           ]
                }

    }
}

output {

        file{
         path => "/etc/elk/logout/testLogs-%{+YYYY.MM.dd.HH.mm}.log"
        }

    elasticsearch
    {
        protocol => "http"
        cluster => "logstash"
        host => "NAMEOFDESTINYHOST"
        index => "logstash-tttttttt-%{+YYYY.MM.dd-HH.mm}"
  }
}

(Christian Dahlqvist) #4

If Logstash is not able to process requests fast enough, it will apply back pressure, which with the inputs you have specified (TCP and UDP based) can lead to data loss. In order to avoid this, a buffering mechanism like Redis is often introduced, and you would have one Logstash instance that is responsible for capturing data and enqueueing it in Redis as quickly as possible. This should do minimal processing in order to be as fast. You would then have one or more separate Logstash instances that read off the Redis queue, do all the processing and send the data to the outputs.

There is a webinar recording available that discusses this in greater detail.


(system) #5