Filebeat is not reading the log file in real time

Hello Team,

We setup new elasticsearch cluster with version 7.10 and beats version is also 7.10.

When we setup the cluster it was working fine and we were getting the logs on kibana dashboard in real time.

But now we enable the xpac security on elasticsaerch and now filebeat is not sending the logs in real time.

Below is the log from filebeat log file when we enable the debug mode:

2021-01-15T14:08:10.783Z	DEBUG	[processors]	processing/processors.go:203	Publish event: {
  "@timestamp": "2021-01-15T14:08:10.783Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "7.10.1"
  },
  "host": {
    "name": "xyz"
  },
  "agent": {
    "ephemeral_id": "0d0141f3-a5cc-40b6-a504-9ec008b25c86",
    "id": "c04302ec-9f17-42dc-8f55-ea02e3eb2b4e",
    "name": "abc",
    "type": "filebeat",
    "version": "7.10.1",
    "hostname": "abc"
  },
  "log": {
    "offset": 1927343878,
    "file": {
      "path": "/var/apps/shared/log/production.log"
    }
  },
  "message": "I, [2021-01-15T12:18:49.794174 #31619]  INFO -- : [c831bbeb-5050-4414-9a36-11adfa3d23d1] Completed 200 OK)",
  "type": "application_log",

We can clearly see the difference between the timestamp and message when this log was generate.

There is no error in elasticsaerch log, logstash log and filebeat log.
Can you please help me on this issue?

Any help will be appriciated.

Hey @Tek_Chand,

Does this topic follow with Filebeat not sending logs in real time? I don't think that enabling security should affect performance to this point.

How is your architecture here? Are you sending to Elasticsearch directly? How is your Elasticsearch cluster?

Do you see any error in filebeat logs?

@jsoriano, Thank you for your response.

Yes..this topic was also created by me for similar issue.

Our Archetecture is like below:

Filebeat->Logstash->Elasticsearch->Kibana all with version 7.10

We are sending our logs to logstash first then logstash send logs to Elasticsaerch.

We are using 3 node elasticsaerch cluster.

No..we are not seeing any error in filebeat logs. But i am attaching some DEBUG logs if you see any suspicious here:

2021-01-16T12:37:12.802Z        DEBUG   [input] log/input.go:439        Check file for harvesting: /var/apps/mobilock/shared/log/production.log
2021-01-16T12:37:12.802Z        DEBUG   [input] log/input.go:530        Update existing file for harvesting: /var/apps/mobilock/shared/log/production.log
, offset: 1581377597
2021-01-16T12:37:12.802Z        DEBUG   [input] log/input.go:582        Harvester for file is still running: /var/apps/mobilock/shared/log/production.log
2021-01-16T12:37:12.802Z        DEBUG   [input] log/input.go:226        input states cleaned up. Before: 1, After: 1, Pending: 0
2021-01-16T12:37:12.813Z        DEBUG   [input] log/input.go:226        input states cleaned up. Before: 1, After: 1, Pending: 0
2021-01-16T12:37:12.815Z        DEBUG   [input] log/input.go:226        input states cleaned up. Before: 1, After: 1, Pending: 0
2021-01-16T12:37:12.826Z        DEBUG   [cfgfile]       cfgfile/reload.go:194   Scan for new config files
2021-01-16T12:37:12.828Z        DEBUG   [input] log/input.go:530        Update existing file for harvesting: /var/log/auth.log, offset: 1042533
2021-01-16T12:37:12.828Z        DEBUG   [input] log/input.go:582        Harvester for file is still running: /var/log/auth.log
2021-01-16T12:37:12.828Z        DEBUG   [input] log/input.go:226        input states cleaned up. Before: 1, After: 1, Pending: 0
2021-01-16T12:37:12.830Z        DEBUG   [input] log/input.go:226        input states cleaned up. Before: 1, After: 1, Pending: 0
2021-01-16T12:37:13.765Z        DEBUG   [reader_multiline]      multiline/pattern.go:170        Multiline event flushed because timeout reached.

Below is our filebeat.yml config file on this machine:

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/apps/mobilock/shared/log/production.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']
  exclude_lines: ['^.*health.*$']
  fields_under_root: true
  fields:
    type: application_log
#     application_log: true

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[
  multiline.pattern: '[A-Z]{1}, \[[0-9]{4}-[0-9]{2}-[0-9]{2}'
  multiline.negate: true
  multiline.match: after
  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after

# filestream is an experimental input. It is going to replace log input in the future.
- type: filestream

  # Change to true to enable this input configuration.
  enabled: false

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: true

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


# ================================== General ===================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
#setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #  host: ["localhost:5601"]

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
#  hosts: ["localhost:9200"]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "elastic"
  #password: "changeme"

# ------------------------------ Logstash Output -------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["10.133.252.250:5044"]
  bulk_max_size: 4096
  
  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================
#processors:
#  - add_host_metadata:
#      when.not.contains.tags: forwarded
#  - add_cloud_metadata: ~
#  - add_docker_metadata: ~
#  - add_kubernetes_metadata: ~

# ================================== Logging ===================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
logging.level: debug
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0644

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by 
output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# uncomment the following line.
#monitoring.elasticsearch:

# ============================== Instrumentation ===============================

# Instrumentation support for the filebeat.
#instrumentation:
    # Set to true to enable instrumentation of filebeat.
    #enabled: false

    # Environment in which filebeat is running on (eg: staging, production, etc.)
    #environment: ""

    # APM Server hosts to report instrumentation results to.
    #hosts:
    #  - http://localhost:8200

    # API Key for the APM Server(s).
    # If api_key is set then secret_token will be ignored.
    #api_key:

    # Secret token for the APM Server(s).
    #secret_token:


# ================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

We are using 2 filebeat modules also system and nginx to send syslog, auth log and nginx logs on elasticsaerch.

We have set the bulk_max_size: 4096 in logstash output. But we are seeing the 2048 events in filebeat logs:

2021-01-16T12:25:18.770Z        DEBUG   [logstash]      logstash/async.go:172   2048 events out of 2048 events sent to logstash host 10.133.252.250:5044. Continue sending

Can you please help us to fix this issue. Do you see any configuration error in filebeat.yml file?

Please help us.

Thank you

@jsoriano, Can you please help us on this issue?

Thank You

@Tek_Chand take a look to this comment: FileBeat slow - Improve performance I think it can be useful in your case.

@jsoriano, Thank You for your response. i read the comment and made some changes filebeat.yml and logstash.yml files.

In filebeat.yml added the below flag in logstash.outpu:

# ------------------------------ Logstash Output -------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["10.133.252.250:5044"]
  bulk_max_size: 16384
  worker: 4
queue.mem:
  events: 32768
  flush.min_events: 512
  flush.timeout: 1s

And we add below entry in logstash.yml

pipeline.workers: 4
pipeline.batch.size: 4096

Now when we start/restart filebeat its start reading logs from some older time like 1-2 hour but its sync with real time after 5-8 minutes.

But if we start/restart filebeat service multiple time then it send some duplicate logs on elastcisaerch.

Did we need to change the value of any flag in filebeat.yml or logstash.yml?

The setting we change are fine or need to increase/decrease any value?

Actually earlier we were using ELK6.4 and it was working fine.

Thank You

Good to see that it is working better now :slight_smile:

Umm, this is interesting, but not sure if this is a serious problem. Filebeat guarantees at least once delivery of messages, for that it expects acknowledges from the output before considering a message as sent. It may happen that if Filebeat is interrupted while sending a batch, it doesn't wait enough to receive the acknowledges, and then these messages will be sent again when starting.

Does this happen on every restart? Does it duplicate many events?

Were you using the same configuration in 6.4?

@jsoriano,

Yes..if it recover the lag time then its not a serious problem. But if it unable to recover the lag time and logfile rotation happen on daily basis then there might be some concern because file may be rotate before filebeat ship all the logs from that file. So real time log shipping is also important.

Yes..its happening on every restart i am providing the log for one event and this server was in real time sync earlier and I restarted the filebeat after enabling debug mode in filebeat.yml and we can see in the timestamp that its lagging again but slowley it will cover the lag time:

2021-01-19T11:43:52.014Z        DEBUG   [processors]    processing/processors.go:203    Publish event: {
  "@timestamp": "2021-01-19T11:43:52.014Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "7.10.1"
  },
  "message": "I, [2021-01-19T09:06:35.921033 #6008]  INFO -- : [5ede51d4-6bd5-4f4c-a0b0-80bece31a65a] notify_eva :: xyz",
  "type": "application_log",
  "input": {
    "type": "log"
  },
  "agent": {
    "version": "7.10.1",
    "hostname": "ubuntu-ml-backend",
    "ephemeral_id": "830e7c00-bfe1-4e5e-8e8d-6ad4f7671aaf",
    "id": "bedc8cb7-c7c0-4cca-bc5b-4503e86270f9",
    "name": "ubuntu-ml-backend",
    "type": "filebeat"
  },
  "ecs": {
    "version": "1.6.0"
  },
  "host": {
    "name": "ubuntu-ml-backend"
  },
  "log": {
    "offset": 2358988167,
    "file": {
      "path": "/var/apps/mobilock/shared/log/production.log"
    }

And logically if start lagging on restart then it will send the duplicate logs to logstash/elasticsearch.

With version 6.4 we were using default settings. But now traffic is increased on our application so filebeat may face issue to ship all the event in real time with default settings.

Currently our application is generating 16-18k events per minute on one server.

Thank You

How are you restarting filebeat?

@jsoriano,

We are using OS Ubuntu 18.04 and restarting the filebeat using below command:

systemctl restart filebeat.service

Thank You

1 Like

@Tek_Chand take a look to the filebeat.shutdown_timeout option, that can help to avoid duplicated events on restart: https://www.elastic.co/guide/en/beats/filebeat/7.10/configuration-general-options.html#shutdown-timeout

@jsoriano, Thank You for your quick response.

I have set the filebeat.shutdown_timeout flag on one machine then stop the filebeat service and start again after 30 seconds and then check the logs and it still showing lagging about half and hour.

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: true

  # Period on which files under path should be checked for changes
  #reload.period: 10s
filebeat.shutdown_timeout: 10s

filebeat.shutdown_timeout flag detemine that filebeat waits for the publisher to finish sending events before shutting down.

Did we need to change any configuration setting in filebeat.yml or logstash.yml which are as below. We are using single logstash host

Thank You

Yes, I suggested the shutdown_timeout flag to solve the issue with repeated events in restarts.

I understood that the performance problems were solved with the other config changes. Let me know if the performance problems persist.

If you want to have exactly the same timestamp in @timestamp as you see in the logs, you need to parse the timestamp from the logs. For that you can use processors, or an ingest pipeline.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.