Filebeat cannot keep up


(Pjanzen) #1

HI All,

I am using filebeat to send events to logstash. These event are produced at a rate between 300 ~ 800 events p/s. If I look at my log and look at the events I see a 4 ~ 6 hours difference where the events pushed by filebeat are behind.

Are there any setting to apply to get rid of this lag?

filebeat 6.2.4
logstash 6.2.4
logstash-filebeat-plugin 5.0.16

Here is my filebeat conf.

###################### Filebeat Configuration Example #########################
filebeat.prospectors:
- type: log
  paths:
    - /var/log/bizimp/messages.log
    - /var/log/bizimp/filters.log
  symlinks: True
  encoding: utf-8
  # start_position: end
  exclude_files: ['.gz$', '.log.$']

  #multiline.pattern: ^\[
  #multiline.negate: false
  #multiline.match: after

#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
name: mx7.tb.ukmail.iss.as9143.net

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
tags: ["CM"]

# queue.mem:
#   events: 2048
#   # flush.min_events: 1024
#   flush.timeout: 2s
#================================ Outputs =====================================

# Configure what outputs to use when sending the data collected by the beat.
# Multiple outputs may be used.

#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  # hosts: ["localhost:9200"]

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["logstash.output.net:12201"]
  worker: 12
  # compression_level: 4
  # pipelining: 80
  max_retries: 10

#----------------------------- Xpack monitoring output --------------------------------
xpack.monitoring:
  enabled: true
  elasticsearch:
    hosts: ["172.25.160.24:31337"]


#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: critical, error, warning, info, debug
logging.level: info

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

(ruflin) #2

How does your setup look like? How many Filebeat instances to do you have and how many Logstash in total?

Can you share your log on the INFO level from Filebeat?


(Pjanzen) #3

Hi,

We have 115 filebeat instances running towards 6 logstash servers (filebeat -> logstash -> redis <- logstash -> elasticsearch).

Those 6 server have each 4 vCpu and 16G mem (6G mem for logstash) and are vmware instances.

The info log after restart is looking like this (including the first 30sec stats):

2018-06-13T21:05:35.506+0200	INFO	instance/beat.go:468	Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat]
2018-06-13T21:05:35.507+0200	INFO	instance/beat.go:475	Beat UUID: 660d6b1f-38d9-4583-acf7-df3ef214eec0
2018-06-13T21:05:35.507+0200	INFO	instance/beat.go:213	Setup Beat: filebeat; Version: 6.2.4
2018-06-13T21:05:35.513+0200	INFO	pipeline/module.go:76	Beat name: mx3.xxxxx.net
2018-06-13T21:05:35.521+0200	INFO	elasticsearch/client.go:145	Elasticsearch url: http://172.25.160.24:31337
2018-06-13T21:05:35.522+0200	INFO	instance/beat.go:301	filebeat start running.
2018-06-13T21:05:35.522+0200	INFO	elasticsearch/elasticsearch.go:154	Start monitoring endpoint init loop.
2018-06-13T21:05:35.522+0200	INFO	registrar/registrar.go:110	Loading registrar data from /var/lib/filebeat/registry
2018-06-13T21:05:35.522+0200	INFO	[monitoring]	log/log.go:97	Starting metrics logging every 30s
2018-06-13T21:05:35.524+0200	INFO	registrar/registrar.go:121	States Loaded from registrar: 36
2018-06-13T21:05:35.524+0200	WARN	beater/filebeat.go:261	Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2018-06-13T21:05:35.524+0200	INFO	crawler/crawler.go:48	Loading Prospectors: 1
2018-06-13T21:05:35.528+0200	INFO	elasticsearch/elasticsearch.go:177	Stop monitoring endpoint init loop.
2018-06-13T21:05:35.528+0200	INFO	elasticsearch/elasticsearch.go:183	Start monitoring metrics snapshot loop.
2018-06-13T21:05:35.551+0200	INFO	log/prospector.go:111	Configured paths: [/var/log/bizimp/messages.log /var/log/bizimp/filters.log]
2018-06-13T21:05:35.551+0200	INFO	crawler/crawler.go:82	Loading and starting Prospectors completed. Enabled prospectors: 1
2018-06-13T21:05:35.552+0200	INFO	log/harvester.go:216	Harvester started for file: /var/log/bizimp/messages.log
2018-06-13T21:05:35.553+0200	INFO	log/harvester.go:216	Harvester started for file: /var/log/bizimp/filters.log
2018-06-13T21:06:05.526+0200	INFO	[monitoring]	log/log.go:124	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":1990,"time":1999},"total":{"ticks":19660,"time":19673,"value":19660},"user":{"ticks":17670,"time":17674}},"info":{"ephemeral_id":"9c782cc1-e4ce-46f6-94b1-d2931867911a","uptime":{"ms":30037}},"memstats":{"gc_next":14660992,"memory_alloc":7482352,"memory_total":791897136,"rss":48476160}},"filebeat":{"events":{"active":4118,"added":180282,"done":176164},"harvester":{"open_files":2,"running":2,"started":2}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":176128,"active":4096,"batches":88,"total":180224},"read":{"bytes":516},"type":"logstash","write":{"bytes":15084793}},"pipeline":{"clients":1,"events":{"active":4117,"filtered":36,"published":180244,"retry":2048,"total":180281},"queue":{"acked":176128}}},"registrar":{"states":{"cleanup":16,"current":20,"update":176164},"writes":105},"system":{"cpu":{"cores":40},"load":{"1":1.17,"15":1.1,"5":1.32,"norm":{"1":0.0293,"15":0.0275,"5":0.033}}},"xpack":{"monitoring":{"pipeline":{"clients":1,"events":{"published":2,"retry":1,"total":2},"queue":{"acked":2}}}}}}}

(ruflin) #4

It would be good to see the logs from a view minutes of running. Reasons is that every 30s some stats are printed which could contain interesting information.

My current assumption is that probably LS is the bottleneck and not Filebeat. Based on your numbers, each Logstash instance must handle around 15'000 events per second, sometime probably even more as the events will not be evenly distributed.

It would be interesting to see what throughput you get if filebeat writes directly to file or you have 1 filebeat instance writing to 1 logstash instance (without the others).

Can you also share your Logstash config and logs? Are you doing some heavy processing on the LS side?


(Pjanzen) #5

I do no heavy lifting on those servers, logstash accepts the events and push them on to a redis queue. All event processing is done on the next logstash servers layer.

I can create a customer support case if you have access to that, then I don't have to anonymize the information. That would save me alot of time..

Here is some stats info.

   |2018-06-14T14:17:57.628+0200|INFO|[monitoring]|log/log.go:124|Non-zero metrics in the last 30s|{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":272570,"time":272575},"total":{"ticks":1787320,"time":1787326,"value":1787320},"user":{"ticks":1514750,"time":1514751}},"info":{"ephemeral_id":"69e0fd67-710f-47e4-b979-eae06ebc1aa0","uptime":{"ms":58950016}},"memstats":{"gc_next":16061408,"memory_alloc":11146496,"memory_total":95227912896}},"filebeat":{"events":{"added":4473,"done":4473},"harvester":{"open_files":3,"running":3}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":4473,"batches":4,"total":4473},"read":{"bytes":48},"write":{"bytes":349954}},"pipeline":{"clients":1,"events":{"active":4117,"published":4473,"total":4473},"queue":{"acked":4473}}},"registrar":{"states":{"current":21,"update":4473},"writes":4},"system":{"load":{"1":1.1,"15":1.42,"5":1.34,"norm":{"1":0.0275,"15":0.0355,"5":0.0335}}},"xpack":{"monitoring":{"pipeline":{"events":{"published":3,"total":3},"queue":{"acked":3}}}}}}}|
    |---|---|---|---|---|---|
    |2018-06-14T14:18:27.630+0200|INFO|[monitoring]|log/log.go:124|Non-zero metrics in the last 30s|{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":272660,"time":272666},"total":{"ticks":1788100,"time":1788112,"value":1788100},"user":{"ticks":1515440,"time":1515446}},"info":{"ephemeral_id":"69e0fd67-710f-47e4-b979-eae06ebc1aa0","uptime":{"ms":58980016}},"memstats":{"gc_next":16107424,"memory_alloc":8312560,"memory_total":95256276824}},"filebeat":{"events":{"added":6017,"done":6017},"harvester":{"open_files":3,"running":3}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":6017,"batches":4,"total":6017},"read":{"bytes":48},"write":{"bytes":474444}},"pipeline":{"clients":1,"events":{"active":4117,"published":6017,"total":6017},"queue":{"acked":6017}}},"registrar":{"states":{"current":21,"update":6017},"writes":4},"system":{"load":{"1":1.12,"15":1.41,"5":1.33,"norm":{"1":0.028,"15":0.0352,"5":0.0333}}},"xpack":{"monitoring":{"pipeline":{"events":{"published":3,"total":3},"queue":{"acked":3}}}}}}}|
    |2018-06-14T14:18:57.630+0200|INFO|[monitoring]|log/log.go:124|Non-zero metrics in the last 30s|{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":272750,"time":272750},"total":{"ticks":1788770,"time":1788777,"value":1788770},"user":{"ticks":1516020,"time":1516027}},"info":{"ephemeral_id":"69e0fd67-710f-47e4-b979-eae06ebc1aa0","uptime":{"ms":59010016}},"memstats":{"gc_next":12348208,"memory_alloc":9472784,"memory_total":95284228320}},"filebeat":{"events":{"added":5894,"done":5894},"harvester":{"open_files":3,"running":3}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":5894,"batches":4,"total":5894},"read":{"bytes":36},"write":{"bytes":470292}},"pipeline":{"clients":1,"events":{"active":4117,"published":5894,"total":5894},"queue":{"acked":5894}}},"registrar":{"states":{"current":21,"update":5894},"writes":4},"system":{"load":{"1":1.32,"15":1.42,"5":1.34,"norm":{"1":0.033,"15":0.0355,"5":0.0335}}},"xpack":{"monitoring":{"pipeline":{"events":{"published":3,"total":3},"queue":{"acked":3}}}}}}}|
    |2018-06-14T14:19:27.630+0200|INFO|[monitoring]|log/log.go:124|Non-zero metrics in the last 30s|{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":272790,"time":272795},"total":{"ticks":1789130,"time":1789142,"value":1789130},"user":{"ticks":1516340,"time":1516347}},"info":{"ephemeral_id":"69e0fd67-710f-47e4-b979-eae06ebc1aa0","uptime":{"ms":59040016}},"memstats":{"gc_next":13066000,"memory_alloc":8336624,"memory_total":95296478816}},"filebeat":{"events":{"added":2298,"done":2298},"harvester":{"open_files":3,"running":3}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":2298,"batches":2,"total":2298},"read":{"bytes":48},"write":{"bytes":183924}},"pipeline":{"clients":1,"events":{"active":4117,"published":2298,"total":2298},"queue":{"acked":2298}}},"registrar":{"states":{"current":21,"update":2298},"writes":2},"system":{"load":{"1":1.44,"15":1.42,"5":1.37,"norm":{"1":0.036,"15":0.0355,"5":0.0343}}},"xpack":{"monitoring":{"pipeline":{"events":{"published":3,"total":3},"queue":{"acked":3}}}}}}}|
    |2018-06-14T14:19:57.630+0200|INFO|[monitoring]|log/log.go:124|Non-zero metrics in the last 30s|{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":272820,"time":272825},"total":{"ticks":1789470,"time":1789476,"value":1789470},"user":{"ticks":1516650,"time":1516651}},"info":{"ephemeral_id":"69e0fd67-710f-47e4-b979-eae06ebc1aa0","uptime":{"ms":59070017}},"memstats":{"gc_next":16091168,"memory_alloc":8623376,"memory_total":95305294656}},"filebeat":{"events":{"added":1798,"done":1798},"harvester":{"open_files":3,"running":3}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":1798,"batches":1,"total":1798},"read":{"bytes":30},"write":{"bytes":133817}},"pipeline":{"clients":1,"events":{"active":4117,"published":1798,"total":1798},"queue":{"acked":1798}}},"registrar":{"states":{"current":21,"update":1798},"writes":1},"system":{"load":{"1":1.25,"15":1.41,"5":1.33,"norm":{"1":0.0313,"15":0.0352,"5":0.0333}}},"xpack":{"monitoring":{"pipeline":{"events":{"published":3,"total":3},"queue":{"acked":3}}}}}}}|
    |2018-06-14T14:20:27.630+0200|INFO|[monitoring]|log/log.go:124|Non-zero metrics in the last 30s|{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":272830,"time":272837},"total":{"ticks":1789490,"time":1789499,"value":1789490},"user":{"ticks":1516660,"time":1516662}},"info":{"ephemeral_id":"69e0fd67-710f-47e4-b979-eae06ebc1aa0","uptime":{"ms":59100016}},"memstats":{"gc_next":16091168,"memory_alloc":9463200,"memory_total":95306134480}},"filebeat":{"harvester":{"open_files":3,"running":3}},"libbeat":{"config":{"module":{"running":0}},"output":{"read":{"bytes":36}},"pipeline":{"clients":1,"events":{"active":4117}}},"registrar":{"states":{"current":21}},"system":{"load":{"1":1.02,"15":1.38,"5":1.27,"norm":{"1":0.0255,"15":0.0345,"5":0.0318}}},"xpack":{"monitoring":{"pipeline":{"events":{"published":3,"total":3},"queue":{"acked":3}}}}}}}|

(ruflin) #6

@pjanzen If you are a customer, please do.


(Pjanzen) #7

I am and I will create a support ticket.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.