File beat High Availability to avoid data loss and avoid duplicates records

Dear Elastic Community,

My main concern is to ensure high availability, avoiding duplicated results. Is it possible to have two Filebeats in two different server to cover for each other in case one of them fails, ensuring that no log is missed or duplicated? If so, how can the second filebeat knows where the primary filebeat left off?

We are receiving files from single server, Need to keep two filebeat instances in two servers .
Thank you in advance

Hello and welcome,

Can you provide a little more context? It is not clear what is the issue you are trying to solve.

What is your filebeat input? Are you reading logs from files or receiving data using a tcp/udp input?

1 Like

"files from single server" - I would say they read files. This would mean something like read from network volume which is not recommend plus deduplication. It's possible to make a file replication to two servers but that cause more headache. In general they want sort of HA for log shipping.

amjad you will get more info rom Leandro when you provide more info.

1 Like

Hi leandro,

Good morning, Hope you are doing well!

We are taking input logs (CSV files) from network team and these logs will place in filebeat server. And from there on, the processing starts.

So, we have 2 different servers for logstash and 3 different servers for kafka for processing the data.

Now, the issue is whenever we are doing any activities on filebeat application we need to stop the services. So, after restarting the services it is taking too much of time to load the data because it is reading the old file which are already loaded and processed.

To avoid this time taking issue we are thinking of installing one more filebeat application in different server. so that whenever the primary server is down then the second one will have to load the data from where the primary server has stopped.

So, is it possible to have 2 filebeat applications in 2 different servers which one will act as passive server and process the data.

Thanks in advance.

This should not normally happens, Filebeat does not reprocess already read files unless something in the file have changed. Are the source files on network shares?

Please share your filebeat.yml

How would this work if the files were in a different server? If you are reading files then the filebeat process needs direct access to the files, for 2 different servers to read the same file you would need to use a network share, which is not recommend as filebeat (and logstash) have issues with network shares, which can lead to the issues you reported.

Can you provide more context here? You mentioned Filebeat, Logstash and Kafka, it is not clear what is the order of things here.

Hi leandrojmp,

Hope you are doing well .Please find the filebeat yml file below.

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.
- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
       - <INPUT FILE>       

    #- c:\programdata\elasticsearch\logs\*


  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


# ================================== General ===================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: 
  # IPv6 addresses should always be defined as: 
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud.

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  #hosts: [""]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "elastic"
  #password: "changeme"

# ------------------------------ Logstash Output -------------------------------
output.logstash:
  # The Logstash hosts
  hosts: [""]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

# ================================== Logging ===================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]

# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:

# ============================== Instrumentation ===============================

# Instrumentation support for the filebeat.
#instrumentation:
    # Set to true to enable instrumentation of filebeat.
    #enabled: false

    # Environment in which filebeat is running on (eg: staging, production, etc.)
    #environment: ""

    # APM Server hosts to report instrumentation results to.
    #hosts:
   

    # API Key for the APM Server(s).
    # If api_key is set then secret_token will be ignored.
    #api_key:

    # Secret token for the APM Server(s).
    #secret_token:


# ================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

Hi @leandrojmp Please check the yml files and help us.

Hello,

You didn't answer none of the othe questions, I can not help further without more context.

HI @leandrojmp Good Evening.

Hope You are doing well.

I have share our Filebeat.yml in chat please check and help us.

Input files we are receiving from network team, files are placed in the network share.

Hello,

I can't help you further without more context, specially this:

Can you provide more context here? You mentioned Filebeat, Logstash and Kafka, it is not clear what is the order of things here.

I cannot picture how you are collecting your logs since you didn't provide this information.

As already mentioned:

If you are reading files then the filebeat process needs direct access to the files, for 2 different servers to read the same file you would need to use a network share, which is not recommend as filebeat (and logstash) have issues with network shares, which can lead to the issues you reported.

I don't think you would be able to solve this using network shares as they do not work well on Filebeat, you should avoid using them.

Good Morning @leandrojmp .
Here the order is first We are using filebeat to receive the input files, network team will place our log files, from filebeat files are transferred to kafka which is a MQ, from Kafka data will transferred to Logstash at this end data is filtering and transferred to elastic.

Hope you got clarity on this.

Our Main concern is we are planning to keep the filebeat in another server as backup same network files will placed in that backup filebeat server ,Once the main server is down then the backup server will load from the last loaded files .(Earlier if the server was down then it takes 8hrs to load the new files becasuse it is reading the old loaded files.

please help me if you need clarity please share your gmail we can connect in gmeet for more clarification
Thanks in advance.

As mentioned before, this will not work and the files may be reprocessed, neither Filebeat nor Logstash works well with Network share.

Also

I don't think you would be able to solve this using network shares as they do not work well on Filebeat, you should avoid using them.

You need to think in an alternative that does not uses network shares as they do not work well with Logstash and Filebeat.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.