Hi,
We have a problem with our Logstash Installation and I 'm a little stuck.
Initially, our logstash install was a v2.4 four nodes install. We recently spent some days to migrate towards 5.6.4 with some hope the problem will disapear. But it keeps going.
We see this problem with a file input, but I can't be sure the problem is located in this plugin or in the logstash core
We have several input like this one
file {
path => ["/mnt/logs_webcare_op1/webcare.log*"]
exclude => "*.gz"
type => "webcare_op"
add_field => { "@host" => "op_webcare_as1" }
start_position => "beginning"
sincedb_path => "/etc/logstash/states/webcare_op1.sincedb"
ignore_older => 172800
}
/mnt/logs_webcare_op1 is an NFS mount
The logs rotate daily. There is not file rotation when we can see anomalies.
There is only one pipeline worker
sincedb files are not shared between the inputs
A typical sample of the file :
2017-11-15 11:55:25,312 INFO [EnablerLogger:144] 743f669b-05d4-4dd2-adb1-37271cb79292|cae3c822-d765-477b-b4ce-4f3f6dcc441c|743f669b-05d4-4dd2-adb1-37271cb79292|API_REST|login|15/11/2017 11:55:25.0312|||PENDING|
...
2017-11-15 11:55:26,360 INFO [EnablerLogger:144] 743f669b-05d4-4dd2-adb1-37271cb79292|cae3c822-d765-477b-b4ce-4f3f6dcc441c|743f669b-05d4-4dd2-adb1-37271cb79292|API_REST|login|15/11/2017 11:55:25.0312|15/11/2017 11:55:26.0360|1048|SUCCESS|login=null
The problem is that we are missing some events.
It's a very little percentage (something like 20/1000000 events)
In the data sample for example we received the first line but not the second.
If I reinject the sample in Logstash, it's OK So I think it's not a probleme with the filters
The events are filtered and injected into a Rabbit Queue
if [@connector] == "webcare" and [@context] == "" {
rabbitmq {
key => "default"
exchange => "*******"
exchange_type => "direct"
user => ""
password => ""
host => "****"
port => 5672
durable => true
persistent => true
vhost => "/"
}
I've added a second output to track the bug and to be sure that problem is not in Rabbit or later
output {
if [@connector] == "webcare" {
file {
path => "/var/log/logstash/debug-webcare-%{+YYYY-MM-dd}.log"
codec => line {
format => "%{@timestamp}, id %{@id}, reference %{uuid}, path %{path}"
}
In this debug-webcare file I have the first event but not the second
2017-11-15T15:55:25.312Z, id 86c8cb11-51f3-4797-b358-38d57b0ec07a, reference cae3c822-d765-477b-b4ce-4f3f6dcc441c, path /mnt/logs_webcare_op2/webcare.log
So I have to debug deeper but has anyone a direction, or an advise ? (I'm not a Ruby guy but I will learn if necessary)
How would you deal with this kind of problem ?
Franck