Slow performance Logstash/Filebeat

Hi everyone,
It’s the first time I ask for help so feel free to tell me if you need more information from me.
Something else, English is not my native language so sorry in advance for the mistakes.

Here is my data flow:

Source Filbeat (~15 servers) -> 1 LS node (with grok parsing) -> ES Version 7.16.2
Elk cluster -> 2 Data Warm Master eligible | 1 Voting-only master eligible Data Hot | 1 Data Hot
LS CPU is at ~7% -> 20cpu
Warm nodes -> 20cpu
Hot nodes -> 16cpu

We get ~6k logs/s over all the applications

On the applications we’re having an issue, we have at most 130 logs/s. That’s why I think Filebeat isn’t the problem.

I’ll first explain how we encounter this issue.
On my application servers, we have a log rotation specified when the current log file reaches a certain size (will be a useful information later). On every application server, there is already a nxlog agent sending the logs to ES, we can’t turn it down until Filebeat isn’t perfectly working of course.

Firstly, we pointed filebeat on the current log files on every server and everything worked just fine. After that, we saw that there were application errors linked to nxlog.

We decided to make filebeat point on the backup files and now we encounter greater delay than the log rotation time, it usually goes to 1.5x to 3x the log rotation time.

I've been asked to not put the conf files sadly, are you able to give any tips without the configuration ?

I can tell you that logstash doesn't have any specific configuration except the grok parsing.

We might have to configure the pipelines but we don't know why we would have since it was working fine before or how.

Thanks in advance for any help

This is not clear, can you give more context? What are those backup files? Are older files? It is not clear what is the delay here.

It is pretty hard to understand your issue without looking at the configuration, you can redact any confidencial information and share it.

Thanks for your help

The backup files as I call them are older files that are stored in another repository with a different name.
I'm going to try to give you an explanation.
Let's say the size of the current log must reach 30Mo before being stored in the other directory.
For a service, the current log file takes around 30 minutes to get this size, then it's moved to another directory and another current log file is created.
Right now by pointing on these "backup files", these logs take around 2 to 3 hours to be available on Kibana.

Here is the logstash conf:

# Settings file in YAML
#
# Settings can be specified either in hierarchical form, e.g.:
#
#   pipeline:
#     batch:
#       size: 125
#       delay: 5
#
# Or as flat keys:
#
#   pipeline.batch.size: 125
#   pipeline.batch.delay: 5
#
# ------------  Node identity ------------
#
# Use a descriptive name for the node:
#
# node.name: test
#
# If omitted the node name will default to the machine's host name
#
# ------------ Data path ------------------
#
# Which directory should be used by logstash and its plugins
# for any persistent needs. Defaults to LOGSTASH_HOME/data
#
path.data: /var/lib/logstash
#
# ------------ Pipeline Settings --------------
#
# The ID of the pipeline.
#
# pipeline.id: main
#
# Set the number of workers that will, in parallel, execute the filters+outputs
# stage of the pipeline.
#
# This defaults to the number of the host's CPU cores.
#
#pipeline.workers: 20
#
# How many events to retrieve from inputs before sending to filters+workers
#
#pipeline.batch.size: 4096
#
# How long to wait in milliseconds while polling for the next event
# before dispatching an undersized batch to filters+outputs
#
# pipeline.batch.delay: 50
#
# Force Logstash to exit during shutdown even if there are still inflight
# events in memory. By default, logstash will refuse to quit until all
# received events have been pushed to the outputs.
#
# WARNING: enabling this can lead to data loss during shutdown
#
# pipeline.unsafe_shutdown: false
#
# ------------ Pipeline Configuration Settings --------------
#
# Where to fetch the pipeline configuration for the main pipeline
#
# path.config:
#
# Pipeline configuration string for the main pipeline
#
# config.string:
#
# At startup, test if the configuration is valid and exit (dry run)
#
# config.test_and_exit: false
#
# Periodically check if the configuration has changed and reload the pipeline
# This can also be triggered manually through the SIGHUP signal
#
# config.reload.automatic: true
#
# How often to check if the pipeline configuration has changed (in seconds)
#
# config.reload.interval: 60s
#
# Show fully compiled configuration as debug log message
# NOTE: --log.level must be 'debug'
#
# config.debug: false
#
# When enabled, process escaped characters such as \n and \" in strings in the
# pipeline configuration files.
#
# config.support_escapes: false
#
monitoring.cluster_uuid: "****************"

# ------------ Module Settings ---------------
# Define modules here.  Modules definitions must be defined as an array.
# The simple way to see this is to prepend each `name` with a `-`, and keep
# all associated variables under the `name` they are associated with, and
# above the next, like this:
#
# modules:
#   - name: MODULE_NAME
#     var.PLUGINTYPE1.PLUGINNAME1.KEY1: VALUE
#     var.PLUGINTYPE1.PLUGINNAME1.KEY2: VALUE
#     var.PLUGINTYPE2.PLUGINNAME1.KEY1: VALUE
#     var.PLUGINTYPE3.PLUGINNAME3.KEY1: VALUE
#
# Module variable names must be in the format of
#
# var.PLUGIN_TYPE.PLUGIN_NAME.KEY
#
# modules:
#
# ------------ Cloud Settings ---------------
# Define Elastic Cloud settings here.
# Format of cloud.id is a base64 value e.g. dXMtZWFzdC0xLmF3cy5mb3VuZC5pbyRub3RhcmVhbCRpZGVudGlmaWVy
# and it may have an label prefix e.g. staging:dXMtZ...
# This will overwrite 'var.elasticsearch.hosts' and 'var.kibana.host'
# cloud.id: <identifier>
#
# Format of cloud.auth is: <user>:<pass>
# This is optional
# If supplied this will overwrite 'var.elasticsearch.username' and 'var.elasticsearch.password'
# If supplied this will overwrite 'var.kibana.username' and 'var.kibana.password'
# cloud.auth: elastic:<password>
#
# ------------ Queuing Settings --------------
#
# Internal queuing model, "memory" for legacy in-memory based queuing and
# "persisted" for disk-based acked queueing. Defaults is memory
#
# queue.type: memory
#
# If using queue.type: persisted, the directory path where the data files will be stored.
# Default is path.data/queue
#
# path.queue:
#
# If using queue.type: persisted, the page data files size. The queue data consists of
# append-only data files separated into pages. Default is 64mb
#
# queue.page_capacity: 64mb
#
# If using queue.type: persisted, the maximum number of unread events in the queue.
# Default is 0 (unlimited)
#
# queue.max_events: 0
#
# If using queue.type: persisted, the total capacity of the queue in number of bytes.
# If you would like more unacked events to be buffered in Logstash, you can increase the
# capacity using this setting. Please make sure your disk drive has capacity greater than
# the size specified here. If both max_bytes and max_events are specified, Logstash will pick
# whichever criteria is reached first
# Default is 1024mb or 1gb
#
# queue.max_bytes: 1024mb
#
# If using queue.type: persisted, the maximum number of acked events before forcing a checkpoint
# Default is 1024, 0 for unlimited
#
# queue.checkpoint.acks: 1024
#
# If using queue.type: persisted, the maximum number of written events before forcing a checkpoint
# Default is 1024, 0 for unlimited
#
# queue.checkpoint.writes: 1024
#
# If using queue.type: persisted, the interval in milliseconds when a checkpoint is forced on the head page
# Default is 1000, 0 for no periodic checkpoint.
#
# queue.checkpoint.interval: 1000
#
# ------------ Dead-Letter Queue Settings --------------
# Flag to turn on dead-letter queue.
#
# dead_letter_queue.enable: false

# If using dead_letter_queue.enable: true, the maximum size of each dead letter queue. Entries
# will be dropped if they would increase the size of the dead letter queue beyond this setting.
# Default is 1024mb
# dead_letter_queue.max_bytes: 1024mb

# If using dead_letter_queue.enable: true, the directory path where the data files will be stored.
# Default is path.data/dead_letter_queue
#
# path.dead_letter_queue:
#
# ------------ Metrics Settings --------------
#
# Bind address for the metrics REST endpoint
#
# http.host: "127.0.0.1"
#
# Bind port for the metrics REST endpoint, this option also accept a range
# (9600-9700) and logstash will pick up the first available ports.
#
# http.port: 9600-9700
#
# ------------ Debugging Settings --------------
#
# Options for log.level:
#   * fatal
#   * error
#   * warn
#   * info (default)
#   * debug
#   * trace
#
# log.level: info
path.logs: /var/log/logstash
#
# ------------ Other Settings --------------
#
# Where to find custom plugins
# path.plugins: []
#
# ------------ X-Pack Settings (not applicable for OSS build)--------------
#
# X-Pack Monitoring
# https://www.elastic.co/guide/en/logstash/current/monitoring-logstash.html
#xpack.monitoring.enabled: true
#xpack.monitoring.elasticsearch.username: logstash_system
#xpack.monitoring.elasticsearch.password: ****
#xpack.monitoring.elasticsearch.hosts: ["http://*.*.*.*.5:9200", "http://*.*.*.*.6:9200", "http://*.*.*.*.7:9200"]
#xpack.monitoring.elasticsearch.ssl.certificate_authority: [ "/path/to/ca.crt" ]
#xpack.monitoring.elasticsearch.ssl.truststore.path: path/to/file
#xpack.monitoring.elasticsearch.ssl.truststore.password: password
#xpack.monitoring.elasticsearch.ssl.keystore.path: /path/to/file
#xpack.monitoring.elasticsearch.ssl.keystore.password: password
#xpack.monitoring.elasticsearch.ssl.verification_mode: certificate
#xpack.monitoring.elasticsearch.sniffing: false
#xpack.monitoring.collection.interval: 10s
#xpack.monitoring.collection.pipeline.details.enabled: true
#
# X-Pack Management
# https://www.elastic.co/guide/en/logstash/current/logstash-centralized-pipeline-management.html
#xpack.management.enabled: true
#xpack.management.pipeline.id: ["*"]
#xpack.management.elasticsearch.username: ****
#xpack.management.elasticsearch.password: ****
#xpack.management.elasticsearch.hosts: ["http://*.*.*.*.5:9200", "http://*.*.*.*.6:9200", "http://*.*.*.*.7:9200", "http://*.*.*.*.8:9200"]
#xpack.management.elasticsearch.ssl.certificate_authority: [ "/path/to/ca.crt" ]
#xpack.management.elasticsearch.ssl.truststore.path: /path/to/file
#xpack.management.elasticsearch.ssl.truststore.password: password
#xpack.management.elasticsearch.ssl.keystore.path: /path/to/file
#xpack.management.elasticsearch.ssl.keystore.password: password
#xpack.management.elasticsearch.ssl.verification_mode: certificate
#xpack.management.elasticsearch.sniffing: false
#xpack.management.logstash.poll_interval: 5s

And one filebeat file

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    #- /*********
    #- /var/log/*.log
    #- **********
    - *****
    
  # exclude_lines: ['^.*<Debug5 >.*Subscription.*','^.*<Debug5 >.*>>.*','^.*<Debug5 >.*<<.*','^.*<Debug5 >.*Msg pour.*','^.*<Debug5 >.*[E,e]mpile.*','^.*<Debug5 >.*[D,d]épile.*']

  fields:
    process: ********
    fields_under_root: true
  #json.keys_under_root: true
  #json.overwrite_keys: true
  ignore_older: 30m
  scan.sort: modtime
  scan.order: desc
  tail_files: true
  close_eof: true
  close_inactive: 25m
  #close_timeout: 8m
  clean_inactive: 1h  
  clean_remove: true
  close_remove: true
  harvester_limit: 5
  document_type: syslog
  
#filebeat.spool_size: 8192
#filebeat.registry.flush: 1s
#max_procs: 2

# # Set gzip compression level.
# compression_level: 1

# queue.mem:
  # # # #maximum events which can be stored in the queue
  # events: 65536
  # # # #forwards events if minimum 512 are accumulated or if oldest is in the queue for 5s
  # flush.min_events: 512
  # flush.timeout: 5s

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
 

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^.*Tracking indication']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after

# filestream is an experimental input. It is going to replace log input in the future.
#- type: filestream

  # Change to true to enable this input configuration.
 # enabled: false

  # Paths that should be crawled and fetched. Glob based paths.
  #paths:
   # - /var/log/*.log
    #- *********

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


# ================================== General ===================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  # hosts: ["localhost:9200"]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "elastic"
  #password: "changeme"

# ------------------------------ Logstash Output -------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["*********:5058"]
  #worker: 4
  #bulk_max_size: 4096
  compression_level: 1

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  #- add_cloud_metadata: ~
  #- add_docker_metadata: ~
  #- add_kubernetes_metadata: ~

# ================================== Logging ===================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]

# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
monitoring.enabled: true

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
monitoring.cluster_uuid: "*********"

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
monitoring.elasticsearch:
    hosts: ["*********:9200", "*********:9200", "*********:9200", "*********:9200"]
    api_key: *********
# ============================== Instrumentation ===============================

# Instrumentation support for the filebeat.
#instrumentation:
    # Set to true to enable instrumentation of filebeat.
    #enabled: false

    # Environment in which filebeat is running on (eg: staging, production, etc.)
    #environment: ""

    # APM Server hosts to report instrumentation results to.
    #hosts:
    #  - http://localhost:8200

    # API Key for the APM Server(s).
    # If api_key is set then secret_token will be ignored.
    #api_key:

    # Secret token for the APM Server(s).
    #secret_token:


# ================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

logging.level: debug
logging.selectors: [ harvester, input ]

Thank you for your help

Are you consuming the log files while they are being written or you wait for them to be moved to another folder?

You removed the paths part of your filebeat configuration, it makes hard to understand, I still do not understand how you are consuming your files and what could be your issue. It is not possible to know if they are on the same path, if they are on different paths.

Can you share the path? Or at least the structure? From what you shared is not possible to know if you have one path or multiple paths.

Any reason to set scan.sort and scan.order ? These settings are still experimental and you are not using the default values also. If I'm not wrong, when filebeat scans a path this will make it sort by the modification time and from the newest to the oldest, so it will consume first the newest files and after that it will consume the oldest files, this can be your issue. Have you tried to remove these configurations?

Hello, as @leandrojmp said it's not clear for me as well. But saying that beat components was working before and now you face an ingestion delay of 2 up to 3 hours to be available on Kibana it made me thinking!

If your use case involves creating a large number of new files every day, you might find that the registry file grows to be too large. See Registry file is too large for details about configuration options that you can set to resolve this issue.

I think that your problem can come from the filebeat registry. In short, when Filebeat start harvesting log files, for each log file it keeps a track of it in a registry and write that to disk in order to comeback and start harvesting the log files if there are some new events. When this registry become too large, you may than encounter log delay. In a similar use case we faced this problem and we decided to clean the registry once a month or two months.

I suggest you to check How filebeat works and check Registry file is too large to overcome this problem. Or if you have a dev environment, stop filebeat, delete the registry manually and start filebeat again to test.

You can open the registry file and count how many log files are tracked by Filebeat. See also registry path.

Hope it helps.

Oh sorry forgot to mention it, we consume the file when it's written and moved in the other directory so the file is "ended" and static at this point.
The current log is in a partition and the backup is in another one.

This is the two possible paths to the logs.

#- A:\path\to\current\file\IGPD-*.log
- E:\path\to\backup\file\IGPD-*.log.*

To be honest, I just got on this project so I don't really know why the scan options are there. I suppose it was used to not forget any file, I'll try getting rid of them and I'll tell you what it does.

Thanks for your help

Hi guys,
Sorry for letting you with no updates.

I've tried your two solutions but it doesn't change the behavior (sadly).
I checked the logs of filebeat and they are up to date, so I don't really get why it's behaving this way.

I just learned we had an ELK licence, so we're going to create a ticket to the support.

Thanks for your help anyway guys.

Hope you're doing fine.