Filebeat missing log lines

paritosh · April 11, 2016, 12:43pm

I have filebeat up and running.

FileBeat version : 1.2.0
Operating System : Ubuntu

My configuration looks something like

filebeat:
  prospectors:
    -
      paths:
        - /home/abc/access.log.2016*
      input_type: log
      ignore_older: 24h
      document_type: access
    -
      paths:
        - /home/abc/console.log.2016*
      input_type: log
      document_type: console
      ignore_older: 24h
  registry_file: /var/lib/filebeat/registry
output:
  logstash:
    hosts: ["localhost:5044"]
logging:
  files:
    path: /var/log/mybeat
    name: mybeat
    rotateeverybytes: 10485760 # = 10MB
  level: debug

Filebeat misses out on some of the logs at random. I haven't seen any pattern as such as far as missing logs are concerned.

I know this is abnormal and any help is appreciated.

ruflin · April 11, 2016, 2:21pm

Can you give some examples on which logs are missing? How do you rotate log files?

paritosh · April 12, 2016, 1:22pm

Example Logs -

order.log.20160403:1342:12:02:29 AM |  INFO | Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. | Lorem ipsum dolor sit amet, | XYZ

order.log.20160403:2817:12:19:25 AM |  INFO | Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. | Lorem ipsum dolor sit amet, | XYZ

I have reverified the grok on logstash and it is perfectly fine as other logs are getting parsed. Some of the logs matching the above pattern are completely missed out on while forwarding.

Log rotation policy :
The current logs are written to a file named order.log. The file is renamed to order.log.<yesterday's date> at midnight.

ruflin · April 13, 2016, 7:47am

So some log entries are missing, not entire log files.

I see you set ignore_older to 24h. Is there a specific reason for that? In filebeat 1.2.0 ignore_older is set to infinity by default but close_older was introduced with 1h as default to close files.

Do you log files always have an empty line between 2 events?

mathias · April 25, 2016, 6:02am

Hi,

I see the same problem. It is not one event here and there but quite many all the time.
I have an application with a debug log which I import to logstash using filebeat (tailing file).
The app produces 383.10 KB/s, 8941 trace lines/s.
A certain event I looked at shows up 230 times/minute in original log.
In Kibana I only see this event 10 times per minute.

Br Mathias

ruflin · April 25, 2016, 6:48am

@mathias Can you provide some more details like:

Filebeat version
Logstash version
Beat-Input-Plugin version
File rotation you use
Filebeat config

mathias · April 25, 2016, 7:20am

Hi,

I see now that I did not read the headline properly.
I use logstash 2.2.4 and beats 1.2.1 (32 bit)
logstash-input-beats 2.2.7

filebeat config:

      paths:
        - /var/log/myapp/*.log
      type: log
      document_type: myapplog
      ignore_older: 5m
      tail_files: true

  ### Logstash as output
  logstash:
    # The Logstash hosts
    hosts: ["192.168.0.158:5045"]
    bulk_max_size: 1024

I just updated to logstash-input-beats 2.2.8 try with bulk size 10000.

mathias · April 25, 2016, 8:24am

Hello again,

I tried with logstash-beats-plugin 2.2.8 and bulk_max_size =10000 but there is no difference.

Parsing my application log looking for a certain event:
NEW minute: 2016:04:25 10-00-00.163 227 hits/min
NEW minute: 2016:04:25 10-01-00.013 219 hits/min
NEW minute: 2016:04:25 10-02-00.003 225 hits/min
NEW minute: 2016:04:25 10-03-00.010 217 hits/min
NEW minute: 2016:04:25 10-04-00.008 229 hits/min

Here is what Kibana looks like using the same filter:

Br Mathias

ruflin · April 25, 2016, 1:30pm

Why do you have tail_files enabled? Can you provide an example of the kind of event is missing? What does your LS config look like?

mathias · April 25, 2016, 2:36pm

Hi Ruffin,

The events are just test strings indicating that an event has occurred in my application.
The string has a date and some information.

My logfile can grow very big (gigabytes after a couple of hours) so I thought it was a good idea to use tail.
Is there a reason why I should not use tail?

[2016-04-25 15:08:50.933] Ft_UEH_UEHANDLER(ue_cell_loop) TRACE9 ue_handler_impl.cc:55: [4]Route to UE(RI{1:172118:263309}, INITIAL_CONN_SETUP{cellId 1, crnti 19771, raUeInfo {timingOffset 0, taSfn 993, taSubframe 5, rxPuschSector 1, rxPucchSector 1, freqOffEstPusch -49, msg3Timestamp 4263638582, rxPower {prbListStart 46, prbListEnd 47, rxPowerReport -913, sinr 86}, lcid 0}, pdu[6]{ 40 A4 8B 19 75 88}})
[2016-04-25 15:08:51.004] Ft_UEH_UEHANDLER(ue_cell_loop) TRACE9 ue_handler_impl.cc:55: [4]Route to UE(RI{1:172119:263310}, INITIAL_CONN_SETUP{cellId 1, crnti 19773, raUeInfo {timingOffset 0, taSfn 1000, taSubframe 5, rxPuschSector 1, rxPucchSector 1, freqOffEstPusch -37, msg3Timestamp 4270978731, rxPower {prbListStart 46, prbListEnd 47, rxPowerReport -943, sinr 49}, lcid 0}, pdu[6]{ 40 A4 8B 19 75 B8}})
[2016-04-25 15:08:51.103] Ft_UEH_UEHANDLER(ue_cell_loop) TRACE9 ue_handler_impl.cc:55: [4]Route to UE(RI{1:172120:263311}, INITIAL_CONN_SETUP{cellId 1, crnti 19775, raUeInfo {timingOffset 0, taSfn 1010, taSubframe 5, rxPuschSector 1, rxPucchSector 1, freqOffEstPusch -35, msg3Timestamp 4281464883, rxPower {prbListStart 46, prbListEnd 47, rxPowerReport -918, sinr 69}, lcid 0}, pdu[6]{ 40 A4 8B 19 74 68}})

LS config:

input {
  beats {
    port => 5045
    ssl => false
    ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"
    ssl_key => "/etc/pki/tls/private/logstash-forwarder.key"
  }
}
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    sniffing => true
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
  file {
    codec => rubydebug
    path => "/tmp/elasticsearch_output"
  }
}

output {
  stdout { codec => rubydebug }
#  stdout { codec => json }

  if [type] == "syslog" and "_grokparsefailure" in [tags] {
    file { path => "/tmp/%{host}/%{type}-%{+YYYY-MM-dd}/failed_syslog_events" }
  }

  file {
#    message_format => "%{[time]} | %{[severity]} : /%{[thread]}/ %{[loggername]} (%{[sourcemethodname]}) - %{[message]}"
    message_format => "%{[message]}"
    path => "/tmp/%{host}/%{type}-%{+YYYY-MM-dd}.log"
  }
}

output {
  stdout { codec => rubydebug }
#  stdout { codec => json }

  if [type] == "syslog" and "_grokparsefailure" in [tags] {
    file { path => "/tmp/%{host}/%{type}-%{+YYYY-MM-dd}/failed_syslog_events" }
  }

  file {
#    message_format => "%{[time]} | %{[severity]} : /%{[thread]}/ %{[loggername]} (%{[sourcemethodname]}) - %{[message]}"
    message_format => "%{[message]}"
    path => "/tmp/%{host}/%{type}-%{+YYYY-MM-dd}.log"
  }
}

mathias · April 25, 2016, 6:26pm

I tried with tail_files = false but that did not change the outcome.

Kind Regards
Mathias

ruflin · April 26, 2016, 8:24am

The reason I asked about tail is that this can lead to loss of data during file rotation.

You filebeat configuration above does not seem to be complete. Can you share the full config file?

It would be helpful if you could run filebeat with the flags -e -d "*" and check if you see something suspicious in the output there.

Just to recap:

Filebeat reads your log files
You have about ~200 events per second
You send your data through LS and do no processing
In Kibana you see some of the events, but not all

You mentioned above that you use a "filter" in Kibana. Which filter is that?

mathias · April 26, 2016, 1:54pm

Hi Ruffin,

Your recap is correct:
The Filter was: type:vuehlog AND Ft_UEH_UEHANDLER* AND INITIAL_CONN_SETUP*

I do not think tail_file works since I see that filebeat is processing old data from time before staring filebeat.

One more thing I have notice is that filebeat is lagging. I compare the source and the file filebeat stores.
Here we can see the difference:
First I parse the source file:
ubuntu@vueh-0:~$ cat /var/log/vueh/stdout.log | ~/tracethroughput.pl | grep hits
NEW minute: 2016-04-26 14:56:23.945 0 hits/min
NEW minute: 2016-04-26 14:58:14.713 0 hits/min
NEW minute: 2016-04-26 15:11:41.980 0 hits/min
NEW minute: 2016-04-26 15:12:00.007 18 hits/min
NEW minute: 2016-04-26 15:13:00.034 107 hits/min
NEW minute: 2016-04-26 15:14:00.139 225 hits/min
NEW minute: 2016-04-26 15:15:00.167 232 hits/min
NEW minute: 2016-04-26 15:16:00.057 226 hits/min
NEW minute: 2016-04-26 15:17:00.087 231 hits/min
NEW minute: 2016-04-26 15:18:00.167 230 hits/min
NEW minute: 2016-04-26 15:19:00.005 229 hits/min
NEW minute: 2016-04-26 15:20:00.001 231 hits/min
NEW minute: 2016-04-26 15:21:00.010 227 hits/min
NEW minute: 2016-04-26 15:22:00.000 224 hits/min
NEW minute: 2016-04-26 15:23:00.014 232 hits/min
NEW minute: 2016-04-26 15:24:00.175 227 hits/min
NEW minute: 2016-04-26 15:25:00.030 233 hits/min
NEW minute: 2016-04-26 15:26:00.005 230 hits/min
NEW minute: 2016-04-26 15:27:00.006 223 hits/min
NEW minute: 2016-04-26 15:28:00.001 229 hits/min
NEW minute: 2016-04-26 15:29:00.016 235 hits/min
NEW minute: 2016-04-26 15:30:00.016 224 hits/min
NEW minute: 2016-04-26 15:31:00.000 227 hits/min
NEW minute: 2016-04-26 15:32:00.000 231 hits/min
NEW minute: 2016-04-26 15:33:00.098 226 hits/min
NEW minute: 2016-04-26 15:34:00.004 231 hits/min
NEW minute: 2016-04-26 15:35:00.003 229 hits/min
NEW minute: 2016-04-26 15:36:00.012 225 hits/min
NEW minute: 2016-04-26 15:37:00.005 231 hits/min
NEW minute: 2016-04-26 15:38:00.003 229 hits/min
NEW minute: 2016-04-26 15:39:00.008 229 hits/min
NEW minute: 2016-04-26 15:40:00.057 226 hits/min
NEW minute: 2016-04-26 15:41:00.023 225 hits/min
NEW minute: 2016-04-26 15:42:00.023 234 hits/min
NEW minute: 2016-04-26 15:43:00.016 230 hits/min

And then I parse the filebeat local output:
ubuntu@vueh-0:~$ cat /tmp/filebeat/filebeat | ~/tracethroughput.pl | grep hits
NEW minute: 2016-04-26 15:11:41.980 0 hits/min
NEW minute: 2016-04-26 15:12:00.007 18 hits/min
NEW minute: 2016-04-26 15:13:00.034 107 hits/min
NEW minute: 2016-04-26 15:14:00.139 225 hits/min
NEW minute: 2016-04-26 15:15:00.167 232 hits/min

At least the first 4 minutes seems to be correct.

Checking the logfile in logstash also seems to be correct the first 4 minutes:
NEW minute: 2016:04:26 15-11-41.980 10 hits/min
NEW minute: 2016:04:26 15-12-00.007 18 hits/min
NEW minute: 2016:04:26 15-13-00.034 107 hits/min
NEW minute: 2016:04:26 15-14-00.139 225 hits/min
NEW minute: 2016:04:26 15-15-00.167 232 hits/min

I have moved elasticsearch and kibana to another server to reduce load on logstash server.

Just noticed that the latest test was running with a GROK and Date filter in logstash.
That might explain why Kibana only has one minute with events and this minute has counts (82) which does not match any minute of the source.

I will repeat the test without GROK and date filter.

BRB
Mathias
PS. Time is now 15:52 and filebeat output has reached 15:16

NEW minute: 2016-04-26 15:11:41.980 0 hits/min
NEW minute: 2016-04-26 15:12:00.007 18 hits/min
NEW minute: 2016-04-26 15:13:00.034 107 hits/min
NEW minute: 2016-04-26 15:14:00.139 225 hits/min
NEW minute: 2016-04-26 15:15:00.167 232 hits/min
NEW minute: 2016-04-26 15:16:00.057 226 hits/min

mathias · April 26, 2016, 3:06pm

Hello again,

Removing all filters did not change anything.

Here is the result:
Source file:
ubuntu@vueh-0:~$ cat /var/log/vueh/stdout.log | ~/tracethroughput.pl | grep hits
NEW minute: 2016-04-26 16:09:27.940 0 hits/min
NEW minute: 2016-04-26 16:11:14.959 0 hits/min
NEW minute: 2016-04-26 16:14:41.689 0 hits/min
NEW minute: 2016-04-26 16:15:00.740 20 hits/min
NEW minute: 2016-04-26 16:16:00.009 99 hits/min
NEW minute: 2016-04-26 16:17:00.113 196 hits/min
NEW minute: 2016-04-26 16:18:00.031 228 hits/min
NEW minute: 2016-04-26 16:19:00.229 229 hits/min
NEW minute: 2016-04-26 16:20:00.000 226 hits/min
NEW minute: 2016-04-26 16:21:00.110 226 hits/min
NEW minute: 2016-04-26 16:22:00.055 229 hits/min
NEW minute: 2016-04-26 16:23:00.013 229 hits/min
NEW minute: 2016-04-26 16:24:00.021 227 hits/min
NEW minute: 2016-04-26 16:25:00.082 230 hits/min
NEW minute: 2016-04-26 16:26:00.012 229 hits/min
NEW minute: 2016-04-26 16:27:00.098 227 hits/min
NEW minute: 2016-04-26 16:28:00.080 228 hits/min
NEW minute: 2016-04-26 16:29:00.113 228 hits/min
NEW minute: 2016-04-26 16:30:00.001 230 hits/min
NEW minute: 2016-04-26 16:31:00.003 231 hits/min
NEW minute: 2016-04-26 16:32:00.002 234 hits/min
NEW minute: 2016-04-26 16:33:00.045 223 hits/min
NEW minute: 2016-04-26 16:34:00.005 229 hits/min
NEW minute: 2016-04-26 16:35:00.000 223 hits/min
NEW minute: 2016-04-26 16:36:00.018 229 hits/min
NEW minute: 2016-04-26 16:37:00.027 235 hits/min
NEW minute: 2016-04-26 16:38:00.130 224 hits/min
NEW minute: 2016-04-26 16:39:00.010 225 hits/min
NEW minute: 2016-04-26 16:40:00.253 232 hits/min
NEW minute: 2016-04-26 16:41:00.109 230 hits/min
NEW minute: 2016-04-26 16:42:00.160 226 hits/min
NEW minute: 2016-04-26 16:43:00.080 230 hits/min
NEW minute: 2016-04-26 16:44:00.009 224 hits/min
NEW minute: 2016-04-26 16:45:00.046 234 hits/min
NEW minute: 2016-04-26 16:46:00.001 232 hits/min
NEW minute: 2016-04-26 16:47:00.111 223 hits/min
NEW minute: 2016-04-26 16:48:00.017 232 hits/min
NEW minute: 2016-04-26 16:49:00.010 221 hits/min
NEW minute: 2016-04-26 16:50:00.100 232 hits/min
NEW minute: 2016-04-26 16:51:00.014 233 hits/min
NEW minute: 2016-04-26 16:52:00.010 223 hits/min
NEW minute: 2016-04-26 16:53:00.000 230 hits/min
NEW minute: 2016-04-26 16:54:00.020 227 hits/min
NEW minute: 2016-04-26 16:55:00.008 232 hits/min
NEW minute: 2016-04-26 16:56:00.010 227 hits/min
NEW minute: 2016-04-26 16:57:00.040 231 hits/min
NEW minute: 2016-04-26 16:58:00.052 222 hits/min

filebeat output:
ubuntu@vueh-0:~$ cat /tmp/filebeat/filebeat | ~/tracethroughput.pl | grep hits
NEW minute: 2016-04-26 16:14:41.689 0 hits/min
NEW minute: 2016-04-26 16:15:00.740 20 hits/min
NEW minute: 2016-04-26 16:16:00.009 99 hits/min
NEW minute: 2016-04-26 16:17:00.113 196 hits/min
NEW minute: 2016-04-26 16:18:00.031 228 hits/min
NEW minute: 2016-04-26 16:19:00.229 229 hits/min

logstash output:
ubuntu@elk:/tmp/vueh-0$ cat vuehlog-2016-04-26.log | ~/tracethroughput.pl | grep hits
NEW minute: 2016:04:26 16-14-41.689 0 hits/min
NEW minute: 2016:04:26 16-15-00.740 20 hits/min
NEW minute: 2016:04:26 16-16-00.009 99 hits/min
NEW minute: 2016:04:26 16-17-00.113 196 hits/min
NEW minute: 2016:04:26 16-18-00.031 228 hits/min
NEW minute: 2016:04:26 16-19-00.229 229 hits/min

I am really scratching my head now.
Why do I see results in Kibana for all minutes but I do not see them in filebeat output nor in logstash output?
Unfortunately is the number of hits per minute not correct in Kibana.

Br Mathias

mathias · April 26, 2016, 7:35pm

Hi,

I reinstalled logstash on a machine with 2 CPUs instead of 1 and with 4Gb of memory instead of 2Gb.
It looked better. Less lagging.
I realized I needed to use grok and date filter to be able to match counts per minute.
When grok filter is enabled things becomes slow and lag increases.

Anyway, Kibana is now showing the same counts are when I parse the source file. Finally!
I will try again when I have a much better machine to test on. Hopefully there will be no delays then.

Regards
Mathias

ruflin · April 27, 2016, 6:30am

Thanks for all the updates. So as you found out it seems like LS cannot process all the log files fast enough. Filebeat in this case slows down but it will still send the events eventually when it catched up. Glad you found a solution

About tail_logs: This only works if you don't have a registry file with a state yet. I assume in your case you started filebeat before without tail_logs and it set a state. So when you enable it later and it finds a state, it will use the state instead of the tail. To prevent this, you have to remove the registry file.

Topic		Replies	Views
Filebeat missing some log lines Beats filebeat	4	2071	July 5, 2017
Filebeat missed one complete log file in log rotation Beats filebeat	10	1452	May 17, 2017
Missing logs with rotate log Beats filebeat	15	5095	July 21, 2016
Filebeat missing files Beats filebeat	19	5484	October 11, 2016
Dropping log lines on log rotation Beats filebeat	15	2660	April 20, 2018

Filebeat missing log lines

Related topics