Logstash data Loss

billy6 · November 19, 2015, 10:11am

Hi!
I have noticed I'm loosing data in logstash, when I send 500 packages from my application, logstash only send 300 more or less.
I have included a counter in the packages sent and I have realized I'm loosing data after 300 UDP packages received. What should I do to solve it? I have though about changing SizedQueue in Logstash or including directly 2 logstash instances with a REDIS buffer.
What is the optimal solution for not losing data?

I have included an output file in logstash configuration in order to count how many packages I'm processing in logstash and the output elasticsearh. I'm receiving more data in elasticsearch than in the output file. How could it be possible? Maybe could be a output file delay problem.
What do you think?

Thank you very much

Christian_Dahlqvist · November 19, 2015, 10:13am

What does your Logstash configuration look like?

billy6 · November 19, 2015, 10:18am

input
 {
    udp {
        type => "udp"
        port => 30000
    }

   gelf {
        type =>"log4j"
        port =>4560
        host =>"0.0.0.0"
        }
 }

filter {
   if [type] == "udp" {
            mutate {
                 rename => ["@host", "host"]
            }
           dns {
                 reverse => ["host"]
                 action =>  "replace"
                  nameserver => "IPSERVERDNS"
            }


           grok {
                 patterns_dir => "/etc/logstash/patterns"

                 match => [
                         "message","%{MESSAGE_1}",
                         "message", "%{MESSAGE_2}",
                         "message", "%{MESSAGE_3}",
                         "message", "%{MESSAGE_4}"

                        ]
                }


        if[app_name]=="old_V3_logs.vi"{
                                date{
                                        match => [ "dater", "YYYY/MM/dd HH:mm:ss.SSS" ]
                                        target => "@timestamp"
                                        }
                        }
        grok {

               patterns_dir => "/etc/logstash/patterns"
                 match => [
                         "Comment","%{COMMENT_MESSAGE2}",
                         "Comment","%{COMMENT_MESSAGE}",
                         "Comment","%{GREEDYDATA}"
           ]
                }

    }
}

output {

        file{
         path => "/etc/elk/logout/testLogs-%{+YYYY.MM.dd.HH.mm}.log"
        }

    elasticsearch
    {
        protocol => "http"
        cluster => "logstash"
        host => "NAMEOFDESTINYHOST"
        index => "logstash-tttttttt-%{+YYYY.MM.dd-HH.mm}"
  }
}

Christian_Dahlqvist · November 19, 2015, 11:28am

If Logstash is not able to process requests fast enough, it will apply back pressure, which with the inputs you have specified (TCP and UDP based) can lead to data loss. In order to avoid this, a buffering mechanism like Redis is often introduced, and you would have one Logstash instance that is responsible for capturing data and enqueueing it in Redis as quickly as possible. This should do minimal processing in order to be as fast. You would then have one or more separate Logstash instances that read off the Redis queue, do all the processing and send the data to the outputs.

There is a webinar recording available that discusses this in greater detail.

Topic		Replies	Views
Logstash UDP input keeps losing data Logstash	3	561	May 25, 2022
Logstash Input UDP problem Logstash	2	1459	July 6, 2017
How to avoid data loss in logstash? Logstash	6	1840	October 25, 2018
I'm loosing data in my ELK stack Logstash	4	665	July 6, 2017
Lost gelf logs Logstash	5	1349	May 8, 2017

Logstash data Loss

Related topics