Need to understand logs loss

Hello,

I am creating this Topic because I really don't know what to do with a behaviour I just noticed.
Actually, I have an Elastic stack which is working and one application server is sending me logs with rsyslog.
Everything works fine but I noticed a loss of logs between 4am and 6am.
I checked on application server and I can see logs in this time slot.
Our Elastic stack is monitored by our Centreon and I did not notice any error on Logstash or Elastic.

Did you ever meet this behaviour ?
Could you help me find the cause of this behaviour ?

Thanks a lot for your help
Best regards
Antoine

Hello

I have more information to share :
I noticed some warn logs in logstash-plain.log during logs loss time slot:

[2019-01-25T13:19:19,998][WARN ][logstash.inputs.udp ] UDP listener died {:exception=>#<SocketError: recvfrom: name or service not known>, :backtrace=>["org/jruby/ext/socket/RubyUDPSocket.java:217:in recvfrom_nonblock'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-udp-3.1.0/lib/logstash/inputs/udp.rb:97:inudp_listener'", "org/jruby/RubyFixnum.java:275:in times'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-udp-3.1.0/lib/logstash/inputs/udp.rb:95:inudp_listener'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-udp-3.1.0/lib/logstash/inputs/udp.rb:56:in run'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:425:ininputworker'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:419:in `start_input'"]}
[2019-01-25T13:19:27,535][INFO ][logstash.inputs.udp ] Starting UDP listener {:address=>"0.0.0.0:5514"}

Also I am collecting some elapsed time and those times are increasing just before logs loss.

Thanks a lot for your help
Antoine

Hello,

I am just sharing some news about this behavior.
In my Logstash configuration I am using Elapsed filter plugin several times because I want to calculate several elapsed time.
One of those elapsed time could last more than a day. For this elapsed time I set timeout with 172800.
Just before I loose some logs I noticed an increasing time for all elapsed time calculate in Logstash.
I have just deactivated the longest elapsed time and my behavior did not reproduce last night.

Here is my elapsed time configuration:
if [rcu_logcode] == "RCU_INUR_001" {
mutate { add_tag => ["assignID"] }
}
if [rcu_logcode] == "RCU_INUR_002" {
mutate { add_tag => ["successIDend"] }
mutate { add_tag => ["successIDstart"] }
}
if [rcu_logcode] == "RCU_INUR_003" {
mutate { add_tag => ["rejectID"] }
}
if [rcu_logcode] == "RCU_INTG_002" or [rcu_logcode] == "RCU_INTG_001" {
mutate { add_tag => ["intgIDend"] }
mutate { add_tag => ["intgIDstart"] }
}
elapsed {
start_tag => "assignID"
end_tag => "successIDstart"
timeout => 1200
unique_id_field => "rcu_logid"
}
if ([elapsed_time]){
mutate { add_tag => ["timeParseUR"] }
ruby {
code => "event.set('timeParseUR', event.get('elapsed_time'))"
}
mutate { remove_field => ["elapsed_time"] }
}
elapsed {
start_tag => "successIDend"
end_tag => "rejectID"
timeout => 1200
unique_id_field => "rcu_logid"
}
if ([elapsed_time]){
mutate { add_tag => ["timeParseToReject"] }
ruby {
code => "event.set('timeParseToReject', event.get('elapsed_time'))"
}
mutate { remove_field => ["elapsed_time"] }
}
elapsed {
start_tag => "successIDend"
end_tag => "intgIDstart"
timeout => 172800
unique_id_field => "rcu_logid"
}
if ([elapsed_time]){
mutate { add_tag => ["timeParseToINTG"] }
ruby {
code => "event.set('timeParseToINTG', event.get('elapsed_time'))"
}
mutate { remove_field => ["elapsed_time"] }
}

Thanks for your help
Antoine

Hi

Does anyone can help me on this please ?
Here is a screen of what is happening right now, top/left graph is showing an increasing in response time, others are showing less and less messages generated.
If I check directly on the server which is generating logs, I am still seeing the same amount of logs generated.

Thanks for your help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.