Filebeat delay to harvest after restart

Hi, i go to tell about my problem.

I have one server with filebeat installed, an this use logstash.output to send logs to graylog. This work, but my problem is that i have a 15/20min delay after apply config changes and reboot service.
And the other problem is that filebeat (or graylog) send (or receive) logs on any order.

I describe mi stack here.
AppServer with Filebeat (7.7.1) -> Graylog (3.1.4) -> Elasticsearch (6.8)
*I don't use logstash.

I have a doubts.

  1. If i want send logs to graylog without logstash, is correct use logstash.output?

  2. After edit filebeat.yml config, and restart the service with "sudo service filebeat restart" that delayed 15/20min before harvest again.
    The daemon is running correctly and i can see the daemon using high CPU, but logs not are sent yo graylog.

Here my config.

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

#=========================== Filebeat inputs =============================

filebeat.inputs:

- type: log
  paths:
    - /var/www/server/var/report/*
  ignore_older: 10m
  close_inactive: 10s
  
- type: log
  paths:
    - /var/www/server/var/log/system.log
    - /var/www/server/var/log/backup_van.log
    - /var/www/server/var/log/exception.log
    - /var/www/server/var/log/stripe.log
    - /var/www/server/api2/storage/logs/lumen.log
    - /var/log/httpd/error_log
    - /var/log/httpd/access_log

- type: log
  paths:
    - /var/www/server/var/log/order.log
    - /var/www/server/var/log/onfleetOrder.log
  multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
  multiline.negate: true
  multiline.match: after

  
  # close_eof: true

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after

#============================= Filebeat modules ===============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

#==================== Elasticsearch template setting ==========================

setup.template.settings:
  index.number_of_shards: 3
  #index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging


#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here, or by using the `-setup` CLI flag or the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

#============================= Elastic Cloud ==================================

# These settings simplify using filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  #hosts: ["localhost:9200"]

  # Enabled ilm (beta) to use index lifecycle management instead daily indices.
  #ilm.enabled: false

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["graylog-server-ip:5044"]
  # slow_start: true
  bulk_max_size: 1
  worker: 1

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Processors =====================================

# Configure processors to enhance or manipulate events generated by the beat.

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

#============================== Xpack Monitoring ===============================
# filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#xpack.monitoring.enabled: false

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well. Any setting that is not set is
# automatically inherited from the Elasticsearch output configuration, so if you
# have the Elasticsearch output configured, you can simply uncomment the
# following line.
#xpack.monitoring.elasticsearch:
  1. What could be the cause by filebeat or graylog send or receive logs in incorrect order?

Thanks for your help.

Hi @b0nete,

I am not very familiar with graylog, but it seems to be compatible with logstash, so yes, it seems to be correct to use the logstash output, there is an example in graylog docs here: https://docs.graylog.org/en/3.3/pages/secure/sec_graylog_beats.html

This looks like some problem in filebeat startup. Looking at your configuration I would be concerned about the files in /var/www/server/var/report/*. Are there many files in this directory?
Have been the ignore_older and close_inactive files always there, or you added them after starting to find these performance problems.

There are some lines of investigation I see related to that:

  • Filebeat is taking a lot of time to decide what files to monitor if there are many files there.
  • Filebeat registry has grown too much. Could you check the size of filebeat registry on this machine? You can find it in the Filebeat data directory.

It'd be also interesting to see Filebeat startup logs. when does it start to open harvesters?

This looks like some problem in filebeat startup. Looking at your configuration I would be concerned about the files in  `/var/www/server/var/report/*` . Are there many files in this directory?

Yes, there are 17k of files. For this reason i added the ignore_older and close_inactive options.
I don't remember exactly, but i think that i added this options after look these problems.

I disabled /var/www/server/var/report/* input and works fine!, logs are harvester inmediatly.
In this directory files are created, written only one time and never more are modified; what's configuration your recommend apply here?

This are logs when i active this input:

2020-06-15T19:23:35.439Z	INFO	instance/beat.go:297	Setup Beat: filebeat; Version: 7.7.1
2020-06-15T19:23:35.441Z	INFO	[publisher]	pipeline/module.go:110	Beat name: ip-10-0-0-144
2020-06-15T19:23:35.441Z	WARN	beater/filebeat.go:152	Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2020-06-15T19:23:35.441Z	INFO	instance/beat.go:438	filebeat start running.
2020-06-15T19:23:35.441Z	WARN	beater/filebeat.go:335	Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2020-06-15T19:23:35.441Z	INFO	registrar/registrar.go:145	Loading registrar data from /var/lib/filebeat/registry/filebeat/data.json
2020-06-15T19:23:35.457Z	INFO	[monitoring]	log/log.go:118	Starting metrics logging every 30s
2020-06-15T19:23:35.539Z	INFO	[add_cloud_metadata]	add_cloud_metadata/add_cloud_metadata.go:93	add_cloud_metadata: hosting provider type detected as aws, metadata={"account":{"id":"***"},"availability_zone":"us-east-1a","image":{"id":"ami-ba7971c5"},"instance":{"id":"i-***"},"machine":{"type":"t2.small"},"provider":"aws","region":"us-east-1"}
2020-06-15T19:23:35.568Z	INFO	registrar/registrar.go:152	States Loaded from registrar: 17421
2020-06-15T19:23:35.568Z	INFO	beater/crawler.go:73	Loading Inputs: 3
2020-06-15T19:24:05.470Z	INFO	[monitoring]	log/log.go:145	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":1560,"time":{"ms":1562}},"total":{"ticks":19030,"time":{"ms":19032},"value":19030},"user":{"ticks":17470,"time":{"ms":17470}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":6},"info":{"ephemeral_id":"270dc807-ad75-4ee6-9735-9cbe42897d09","uptime":{"ms":30060}},"memstats":{"gc_next":40432176,"memory_alloc":25110480,"memory_total":2539994424,"rss":87826432},"runtime":{"goroutines":18}},"filebeat":{"events":{"active":5,"added":514,"done":509},"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"logstash"},"pipeline":{"clients":1,"events":{"active":1,"filtered":513,"total":514}}},"registrar":{"states":{"current":17421,"update":510},"writes":{"success":509,"total":510}},"system":{"cpu":{"cores":1},"load":{"1":0.63,"15":0.9,"5":0.63,"norm":{"1":0.63,"15":0.9,"5":0.63}}}}}}
2020-06-15T19:24:35.470Z	INFO	[monitoring]	log/log.go:145	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":3040,"time":{"ms":1480}},"total":{"ticks":37480,"time":{"ms":18458},"value":37480},"user":{"ticks":34440,"time":{"ms":16978}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":6},"info":{"ephemeral_id":"270dc807-ad75-4ee6-9735-9cbe42897d09","uptime":{"ms":60060}},"memstats":{"gc_next":40025168,"memory_alloc":35408536,"memory_total":5002312616,"rss":11829248},"runtime":{"goroutines":18}},"filebeat":{"events":{"added":502,"done":502},"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":1,"filtered":502,"total":502}}},"registrar":{"states":{"current":17421,"update":502},"writes":{"success":502,"total":502}},"system":{"load":{"1":1.46,"15":0.96,"5":0.83,"norm":{"1":1.46,"15":0.96,"5":0.83}}}}}}
2020-06-15T19:25:05.474Z	INFO	[monitoring]	log/log.go:145	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":4630,"time":{"ms":1597}},"total":{"ticks":57300,"time":{"ms":19819},"value":57300},"user":{"ticks":52670,"time":{"ms":18222}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":6},"info":{"ephemeral_id":"270dc807-ad75-4ee6-9735-9cbe42897d09","uptime":{"ms":90064}},"memstats":{"gc_next":41562448,"memory_alloc":26614256,"memory_total":7612323272,"rss":262144},"runtime":{"goroutines":18}},"filebeat":{"events":{"added":532,"done":532},"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":1,"filtered":532,"total":532}}},"registrar":{"states":{"current":17421,"update":532},"writes":{"success":532,"total":532}},"system":{"load":{"1":1.41,"15":0.97,"5":0.88,"norm":{"1":1.41,"15":0.97,"5":0.88}}}}}}
2020-06-15T19:25:35.468Z	INFO	[monitoring]	log/log.go:145	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":6090,"time":{"ms":1452}},"total":{"ticks":75980,"time":{"ms":18673},"value":75980},"user":{"ticks":69890,"time":{"ms":17221}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":6},"info":{"ephemeral_id":"270dc807-ad75-4ee6-9735-9cbe42897d09","uptime":{"ms":120058}},"memstats":{"gc_next":40606640,"memory_alloc":39530920,"memory_total":10094865232},"runtime":{"goroutines":18}},"filebeat":{"events":{"added":506,"done":506},"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":1,"filtered":506,"total":506}}},"registrar":{"states":{"current":17421,"update":506},"writes":{"success":506,"total":506}},"system":{"load":{"1":1.47,"15":0.99,"5":0.94,"norm":{"1":1.47,"15":0.99,"5":0.94}}}}}}
2020-06-15T19:26:05.458Z	INFO	[monitoring]	log/log.go:145	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":7640,"time":{"ms":1558}},"total":{"ticks":95440,"time":{"ms":19470},"value":95440},"user":{"ticks":87800,"time":{"ms":17912}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":6},"info":{"ephemeral_id":"270dc807-ad75-4ee6-9735-9cbe42897d09","uptime":{"ms":150049}},"memstats":{"gc_next":40957264,"memory_alloc":29177664,"memory_total":12629133112,"rss":270336},"runtime":{"goroutines":18}},"filebeat":{"events":{"added":517,"done":517},"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":1,"filtered":517,"total":517}}},"registrar":{"states":{"current":17421,"update":517},"writes":{"success":517,"total":517}},"system":{"load":{"1":1.42,"15":1,"5":0.98,"norm":{"1":1.42,"15":1,"5":0.98}}}}}}

Thanks for your help @jsoriano!

In principle ignore_older should help on this case.

What is the size of your registry file? It can be found in registry/filebeat/data.json under the data directory.

I have remembered another option that will probably help here if the registry file is quite big.
filebeat.registry.flush controls the time between flushes of the registry to disk. In this case, with many files and probably a big registry, writing the registry too frequently can be a heavy task. This is probably what is happening on startup.

You can try to add this line to your config, so the registry is written once per second as much:

filebeat.registry.flush: 1s
1 Like

Perfect, with both options seems work.
I set
filebeat.registry.flush: 1s and ignore_older: 10m.

Just in case put my registry files.

[root@ip-11-0-0-213 filebeat]# ls -latrh
total 3.4M
-rw------- 1 root root   16 Jun 10 21:41 meta.json
-rw------- 1 root root **3.4M** Jun 16 19:32 data.json

I think this size is correct for registry, or not?

You have any suggestion for the other problem?
I mean about my problem with order of logs in graylog, I really don't know if is problem from filebeat or graylog.

Thanks again!

3.4MB for a registry file is quite big. It probably contains references to thousands of files that were not properly removed by filebeat. This can happen if filebeat has been running during a time without the ignore_older and close_inactive options. Files that were being harvested by filebeat before adding these options will remain there.

A workaround for this would be to manually clean up the registry. The data.json file is a json file that contains an entry for each file being harvested, you could prepare a script that removes the entries related to files that you don't need to collect anymore (in your case old files or files that don't exist anymore).

There is an ongoing effort to revamp the internal inputs API (https://github.com/elastic/beats/issues/15324), this includes the migration of the registry to a new implementation, this will probably address this kind of issues in future versions of Filebeat.

Regarding this problem, this is going to depend on the format of the log lines and how they are parsed and ingested. If they include a timestamp, and it is being parsed, they should appear in order when visualized in Kibana. If they don't included a timestamp, or it is not being parsed, then its timestamp will be created when the events are ingested, this is going to depend on when the files are read, and on any delay introduced by your data pipeline. In this latter case, order can be hardly granted.

Could you provide an example of log lines that you are seeing out of order? Do they include timestamps?

It's weird.

I have a development server, so i tried this.

  • Stop filebeat
  • Delete files in folder "/var/lib/filebeat/registry/filebeat"
  • Start filebeat.

And i started service again, which create meta.json and data.json. And data.json file size is 3.4 again.
I only set ignore_older: 10m, not set close_inactive because 5m is default al value and is good.

Regarding the order logs problem we have multiple types of files, with timestamp and without timestamp.
I was reading that the timestamp help to order logs, but that doesn't happens here.

These are some lines greped of original file log access_log in order.

10.0.0.54 - - [17/Jun/2020:19:08:24 +0000] "GET /api/health.php HTTP/1.1" 200 8 "-" "ELB-HealthChecker/1.0"
10.0.0.170 - - [17/Jun/2020:19:08:27 +0000] "GET /api/health.php HTTP/1.1" 200 8 "-" "ELB-HealthChecker/1.0"
10.0.0.222 - - [17/Jun/2020:19:08:28 +0000] "GET /api/health.php HTTP/1.1" 200 28 "-" "ELB-HealthChecker/2.0"
10.0.1.66 - - [17/Jun/2020:19:08:29 +0000] "GET /api/health.php HTTP/1.1" 200 28 "-" "ELB-HealthChecker/2.0"
10.0.0.54 - - [17/Jun/2020:19:08:29 +0000] "GET /api/health.php HTTP/1.1" 200 8 "-" "ELB-HealthChecker/1.0"
10.0.0.170 - - [17/Jun/2020:19:08:32 +0000] "GET /api/health.php HTTP/1.1" 200 8 "-" "ELB-HealthChecker/1.0"
10.0.0.222 - - [17/Jun/2020:19:08:33 +0000] "GET /api/health.php HTTP/1.1" 200 28 "-" "ELB-HealthChecker/2.0"

And in graylog i see this, in bad order.

And does it also take so long to start?

From the screenshot you are sharing it seems that timestamps are not being parsed. For example the first entry seems to have 19:08:36 in the timestamp and 19:08:33 in the logs.

You can use filebeat processors to parse your logs, or an elasticsearch ingest pipeline. And then extract the timestamp from there.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.