Filebeat missing log lines


(Paritosh Walvekar) #1

I have filebeat up and running.

FileBeat version : 1.2.0
Operating System : Ubuntu

My configuration looks something like

filebeat:
  prospectors:
    -
      paths:
        - /home/abc/access.log.2016*
      input_type: log
      ignore_older: 24h
      document_type: access
    -
      paths:
        - /home/abc/console.log.2016*
      input_type: log
      document_type: console
      ignore_older: 24h
  registry_file: /var/lib/filebeat/registry
output:
  logstash:
    hosts: ["localhost:5044"]
logging:
  files:
    path: /var/log/mybeat
    name: mybeat
    rotateeverybytes: 10485760 # = 10MB
  level: debug

Filebeat misses out on some of the logs at random. I haven't seen any pattern as such as far as missing logs are concerned.

I know this is abnormal and any help is appreciated.


(ruflin) #2

Can you give some examples on which logs are missing? How do you rotate log files?


(Paritosh Walvekar) #3

Example Logs -

order.log.20160403:1342:12:02:29 AM |  INFO | Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. | Lorem ipsum dolor sit amet, | XYZ

order.log.20160403:2817:12:19:25 AM |  INFO | Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. | Lorem ipsum dolor sit amet, | XYZ

I have reverified the grok on logstash and it is perfectly fine as other logs are getting parsed. Some of the logs matching the above pattern are completely missed out on while forwarding.

Log rotation policy :
The current logs are written to a file named order.log. The file is renamed to order.log.<yesterday's date> at midnight.


(ruflin) #4

So some log entries are missing, not entire log files.

I see you set ignore_older to 24h. Is there a specific reason for that? In filebeat 1.2.0 ignore_older is set to infinity by default but close_older was introduced with 1h as default to close files.

Do you log files always have an empty line between 2 events?


(Mathias) #5

Hi,

I see the same problem. It is not one event here and there but quite many all the time.
I have an application with a debug log which I import to logstash using filebeat (tailing file).
The app produces 383.10 KB/s, 8941 trace lines/s.
A certain event I looked at shows up 230 times/minute in original log.
In Kibana I only see this event 10 times per minute.

Br Mathias


(ruflin) #6

@mathias Can you provide some more details like:

  • Filebeat version
  • Logstash version
  • Beat-Input-Plugin version
  • File rotation you use
  • Filebeat config

(Mathias) #7

Hi,

I see now that I did not read the headline properly.
I use logstash 2.2.4 and beats 1.2.1 (32 bit)
logstash-input-beats 2.2.7

filebeat config:

      paths:
        - /var/log/myapp/*.log
      type: log
      document_type: myapplog
      ignore_older: 5m
      tail_files: true

  ### Logstash as output
  logstash:
    # The Logstash hosts
    hosts: ["192.168.0.158:5045"]
    bulk_max_size: 1024

I just updated to logstash-input-beats 2.2.8 try with bulk size 10000.


(Mathias) #8

Hello again,

I tried with logstash-beats-plugin 2.2.8 and bulk_max_size =10000 but there is no difference.

Parsing my application log looking for a certain event:
NEW minute: 2016:04:25 10-00-00.163 227 hits/min
NEW minute: 2016:04:25 10-01-00.013 219 hits/min
NEW minute: 2016:04:25 10-02-00.003 225 hits/min
NEW minute: 2016:04:25 10-03-00.010 217 hits/min
NEW minute: 2016:04:25 10-04-00.008 229 hits/min

Here is what Kibana looks like using the same filter:

Br Mathias


(ruflin) #9

Why do you have tail_files enabled? Can you provide an example of the kind of event is missing? What does your LS config look like?


(Mathias) #10

Hi Ruffin,

The events are just test strings indicating that an event has occurred in my application.
The string has a date and some information.

My logfile can grow very big (gigabytes after a couple of hours) so I thought it was a good idea to use tail.
Is there a reason why I should not use tail?

[2016-04-25 15:08:50.933] Ft_UEH_UEHANDLER(ue_cell_loop) TRACE9 ue_handler_impl.cc:55: [4]Route to UE(RI{1:172118:263309}, INITIAL_CONN_SETUP{cellId 1, crnti 19771, raUeInfo {timingOffset 0, taSfn 993, taSubframe 5, rxPuschSector 1, rxPucchSector 1, freqOffEstPusch -49, msg3Timestamp 4263638582, rxPower {prbListStart 46, prbListEnd 47, rxPowerReport -913, sinr 86}, lcid 0}, pdu[6]{ 40 A4 8B 19 75 88}})
[2016-04-25 15:08:51.004] Ft_UEH_UEHANDLER(ue_cell_loop) TRACE9 ue_handler_impl.cc:55: [4]Route to UE(RI{1:172119:263310}, INITIAL_CONN_SETUP{cellId 1, crnti 19773, raUeInfo {timingOffset 0, taSfn 1000, taSubframe 5, rxPuschSector 1, rxPucchSector 1, freqOffEstPusch -37, msg3Timestamp 4270978731, rxPower {prbListStart 46, prbListEnd 47, rxPowerReport -943, sinr 49}, lcid 0}, pdu[6]{ 40 A4 8B 19 75 B8}})
[2016-04-25 15:08:51.103] Ft_UEH_UEHANDLER(ue_cell_loop) TRACE9 ue_handler_impl.cc:55: [4]Route to UE(RI{1:172120:263311}, INITIAL_CONN_SETUP{cellId 1, crnti 19775, raUeInfo {timingOffset 0, taSfn 1010, taSubframe 5, rxPuschSector 1, rxPucchSector 1, freqOffEstPusch -35, msg3Timestamp 4281464883, rxPower {prbListStart 46, prbListEnd 47, rxPowerReport -918, sinr 69}, lcid 0}, pdu[6]{ 40 A4 8B 19 74 68}})

LS config:

input {
  beats {
    port => 5045
    ssl => false
    ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"
    ssl_key => "/etc/pki/tls/private/logstash-forwarder.key"
  }
}
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    sniffing => true
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
  file {
    codec => rubydebug
    path => "/tmp/elasticsearch_output"
  }
}

output {
  stdout { codec => rubydebug }
#  stdout { codec => json }

  if [type] == "syslog" and "_grokparsefailure" in [tags] {
    file { path => "/tmp/%{host}/%{type}-%{+YYYY-MM-dd}/failed_syslog_events" }
  }

  file {
#    message_format => "%{[time]} | %{[severity]} : /%{[thread]}/ %{[loggername]} (%{[sourcemethodname]}) - %{[message]}"
    message_format => "%{[message]}"
    path => "/tmp/%{host}/%{type}-%{+YYYY-MM-dd}.log"
  }
}

output {
  stdout { codec => rubydebug }
#  stdout { codec => json }

  if [type] == "syslog" and "_grokparsefailure" in [tags] {
    file { path => "/tmp/%{host}/%{type}-%{+YYYY-MM-dd}/failed_syslog_events" }
  }

  file {
#    message_format => "%{[time]} | %{[severity]} : /%{[thread]}/ %{[loggername]} (%{[sourcemethodname]}) - %{[message]}"
    message_format => "%{[message]}"
    path => "/tmp/%{host}/%{type}-%{+YYYY-MM-dd}.log"
  }
}

(Mathias) #11

I tried with tail_files = false but that did not change the outcome.

Kind Regards
Mathias


(ruflin) #12

The reason I asked about tail is that this can lead to loss of data during file rotation.

You filebeat configuration above does not seem to be complete. Can you share the full config file?

It would be helpful if you could run filebeat with the flags -e -d "*" and check if you see something suspicious in the output there.

Just to recap:

  • Filebeat reads your log files
  • You have about ~200 events per second
  • You send your data through LS and do no processing
  • In Kibana you see some of the events, but not all

You mentioned above that you use a "filter" in Kibana. Which filter is that?


(Mathias) #13

Hi Ruffin,

Your recap is correct:
The Filter was: type:vuehlog AND Ft_UEH_UEHANDLER* AND INITIAL_CONN_SETUP*

I do not think tail_file works since I see that filebeat is processing old data from time before staring filebeat.

One more thing I have notice is that filebeat is lagging. I compare the source and the file filebeat stores.
Here we can see the difference:
First I parse the source file:
ubuntu@vueh-0:~$ cat /var/log/vueh/stdout.log | ~/tracethroughput.pl | grep hits
NEW minute: 2016-04-26 14:56:23.945 0 hits/min
NEW minute: 2016-04-26 14:58:14.713 0 hits/min
NEW minute: 2016-04-26 15:11:41.980 0 hits/min
NEW minute: 2016-04-26 15:12:00.007 18 hits/min
NEW minute: 2016-04-26 15:13:00.034 107 hits/min
NEW minute: 2016-04-26 15:14:00.139 225 hits/min
NEW minute: 2016-04-26 15:15:00.167 232 hits/min
NEW minute: 2016-04-26 15:16:00.057 226 hits/min
NEW minute: 2016-04-26 15:17:00.087 231 hits/min
NEW minute: 2016-04-26 15:18:00.167 230 hits/min
NEW minute: 2016-04-26 15:19:00.005 229 hits/min
NEW minute: 2016-04-26 15:20:00.001 231 hits/min
NEW minute: 2016-04-26 15:21:00.010 227 hits/min
NEW minute: 2016-04-26 15:22:00.000 224 hits/min
NEW minute: 2016-04-26 15:23:00.014 232 hits/min
NEW minute: 2016-04-26 15:24:00.175 227 hits/min
NEW minute: 2016-04-26 15:25:00.030 233 hits/min
NEW minute: 2016-04-26 15:26:00.005 230 hits/min
NEW minute: 2016-04-26 15:27:00.006 223 hits/min
NEW minute: 2016-04-26 15:28:00.001 229 hits/min
NEW minute: 2016-04-26 15:29:00.016 235 hits/min
NEW minute: 2016-04-26 15:30:00.016 224 hits/min
NEW minute: 2016-04-26 15:31:00.000 227 hits/min
NEW minute: 2016-04-26 15:32:00.000 231 hits/min
NEW minute: 2016-04-26 15:33:00.098 226 hits/min
NEW minute: 2016-04-26 15:34:00.004 231 hits/min
NEW minute: 2016-04-26 15:35:00.003 229 hits/min
NEW minute: 2016-04-26 15:36:00.012 225 hits/min
NEW minute: 2016-04-26 15:37:00.005 231 hits/min
NEW minute: 2016-04-26 15:38:00.003 229 hits/min
NEW minute: 2016-04-26 15:39:00.008 229 hits/min
NEW minute: 2016-04-26 15:40:00.057 226 hits/min
NEW minute: 2016-04-26 15:41:00.023 225 hits/min
NEW minute: 2016-04-26 15:42:00.023 234 hits/min
NEW minute: 2016-04-26 15:43:00.016 230 hits/min

And then I parse the filebeat local output:
ubuntu@vueh-0:~$ cat /tmp/filebeat/filebeat | ~/tracethroughput.pl | grep hits
NEW minute: 2016-04-26 15:11:41.980 0 hits/min
NEW minute: 2016-04-26 15:12:00.007 18 hits/min
NEW minute: 2016-04-26 15:13:00.034 107 hits/min
NEW minute: 2016-04-26 15:14:00.139 225 hits/min
NEW minute: 2016-04-26 15:15:00.167 232 hits/min

At least the first 4 minutes seems to be correct.

Checking the logfile in logstash also seems to be correct the first 4 minutes:
NEW minute: 2016:04:26 15-11-41.980 10 hits/min
NEW minute: 2016:04:26 15-12-00.007 18 hits/min
NEW minute: 2016:04:26 15-13-00.034 107 hits/min
NEW minute: 2016:04:26 15-14-00.139 225 hits/min
NEW minute: 2016:04:26 15-15-00.167 232 hits/min

I have moved elasticsearch and kibana to another server to reduce load on logstash server.

Just noticed that the latest test was running with a GROK and Date filter in logstash.
That might explain why Kibana only has one minute with events and this minute has counts (82) which does not match any minute of the source.

I will repeat the test without GROK and date filter.

BRB
Mathias
PS. Time is now 15:52 and filebeat output has reached 15:16 :frowning:

NEW minute: 2016-04-26 15:11:41.980 0 hits/min
NEW minute: 2016-04-26 15:12:00.007 18 hits/min
NEW minute: 2016-04-26 15:13:00.034 107 hits/min
NEW minute: 2016-04-26 15:14:00.139 225 hits/min
NEW minute: 2016-04-26 15:15:00.167 232 hits/min
NEW minute: 2016-04-26 15:16:00.057 226 hits/min


(Mathias) #14

Hello again,

Removing all filters did not change anything.

Here is the result:
Source file:
ubuntu@vueh-0:~$ cat /var/log/vueh/stdout.log | ~/tracethroughput.pl | grep hits
NEW minute: 2016-04-26 16:09:27.940 0 hits/min
NEW minute: 2016-04-26 16:11:14.959 0 hits/min
NEW minute: 2016-04-26 16:14:41.689 0 hits/min
NEW minute: 2016-04-26 16:15:00.740 20 hits/min
NEW minute: 2016-04-26 16:16:00.009 99 hits/min
NEW minute: 2016-04-26 16:17:00.113 196 hits/min
NEW minute: 2016-04-26 16:18:00.031 228 hits/min
NEW minute: 2016-04-26 16:19:00.229 229 hits/min
NEW minute: 2016-04-26 16:20:00.000 226 hits/min
NEW minute: 2016-04-26 16:21:00.110 226 hits/min
NEW minute: 2016-04-26 16:22:00.055 229 hits/min
NEW minute: 2016-04-26 16:23:00.013 229 hits/min
NEW minute: 2016-04-26 16:24:00.021 227 hits/min
NEW minute: 2016-04-26 16:25:00.082 230 hits/min
NEW minute: 2016-04-26 16:26:00.012 229 hits/min
NEW minute: 2016-04-26 16:27:00.098 227 hits/min
NEW minute: 2016-04-26 16:28:00.080 228 hits/min
NEW minute: 2016-04-26 16:29:00.113 228 hits/min
NEW minute: 2016-04-26 16:30:00.001 230 hits/min
NEW minute: 2016-04-26 16:31:00.003 231 hits/min
NEW minute: 2016-04-26 16:32:00.002 234 hits/min
NEW minute: 2016-04-26 16:33:00.045 223 hits/min
NEW minute: 2016-04-26 16:34:00.005 229 hits/min
NEW minute: 2016-04-26 16:35:00.000 223 hits/min
NEW minute: 2016-04-26 16:36:00.018 229 hits/min
NEW minute: 2016-04-26 16:37:00.027 235 hits/min
NEW minute: 2016-04-26 16:38:00.130 224 hits/min
NEW minute: 2016-04-26 16:39:00.010 225 hits/min
NEW minute: 2016-04-26 16:40:00.253 232 hits/min
NEW minute: 2016-04-26 16:41:00.109 230 hits/min
NEW minute: 2016-04-26 16:42:00.160 226 hits/min
NEW minute: 2016-04-26 16:43:00.080 230 hits/min
NEW minute: 2016-04-26 16:44:00.009 224 hits/min
NEW minute: 2016-04-26 16:45:00.046 234 hits/min
NEW minute: 2016-04-26 16:46:00.001 232 hits/min
NEW minute: 2016-04-26 16:47:00.111 223 hits/min
NEW minute: 2016-04-26 16:48:00.017 232 hits/min
NEW minute: 2016-04-26 16:49:00.010 221 hits/min
NEW minute: 2016-04-26 16:50:00.100 232 hits/min
NEW minute: 2016-04-26 16:51:00.014 233 hits/min
NEW minute: 2016-04-26 16:52:00.010 223 hits/min
NEW minute: 2016-04-26 16:53:00.000 230 hits/min
NEW minute: 2016-04-26 16:54:00.020 227 hits/min
NEW minute: 2016-04-26 16:55:00.008 232 hits/min
NEW minute: 2016-04-26 16:56:00.010 227 hits/min
NEW minute: 2016-04-26 16:57:00.040 231 hits/min
NEW minute: 2016-04-26 16:58:00.052 222 hits/min

filebeat output:
ubuntu@vueh-0:~$ cat /tmp/filebeat/filebeat | ~/tracethroughput.pl | grep hits
NEW minute: 2016-04-26 16:14:41.689 0 hits/min
NEW minute: 2016-04-26 16:15:00.740 20 hits/min
NEW minute: 2016-04-26 16:16:00.009 99 hits/min
NEW minute: 2016-04-26 16:17:00.113 196 hits/min
NEW minute: 2016-04-26 16:18:00.031 228 hits/min
NEW minute: 2016-04-26 16:19:00.229 229 hits/min

logstash output:
ubuntu@elk:/tmp/vueh-0$ cat vuehlog-2016-04-26.log | ~/tracethroughput.pl | grep hits
NEW minute: 2016:04:26 16-14-41.689 0 hits/min
NEW minute: 2016:04:26 16-15-00.740 20 hits/min
NEW minute: 2016:04:26 16-16-00.009 99 hits/min
NEW minute: 2016:04:26 16-17-00.113 196 hits/min
NEW minute: 2016:04:26 16-18-00.031 228 hits/min
NEW minute: 2016:04:26 16-19-00.229 229 hits/min

I am really scratching my head now.
Why do I see results in Kibana for all minutes but I do not see them in filebeat output nor in logstash output?
Unfortunately is the number of hits per minute not correct in Kibana.

Br Mathias


(Mathias) #15

Hi,

I reinstalled logstash on a machine with 2 CPUs instead of 1 and with 4Gb of memory instead of 2Gb.
It looked better. Less lagging.
I realized I needed to use grok and date filter to be able to match counts per minute.
When grok filter is enabled things becomes slow and lag increases.

Anyway, Kibana is now showing the same counts are when I parse the source file. Finally! :smiley:
I will try again when I have a much better machine to test on. Hopefully there will be no delays then.

Regards
Mathias


(ruflin) #16

Thanks for all the updates. So as you found out it seems like LS cannot process all the log files fast enough. Filebeat in this case slows down but it will still send the events eventually when it catched up. Glad you found a solution

About tail_logs: This only works if you don't have a registry file with a state yet. I assume in your case you started filebeat before without tail_logs and it set a state. So when you enable it later and it finds a state, it will use the state instead of the tail. To prevent this, you have to remove the registry file.


(system) #17