Filebeat is not polling the logs in the order of their time stamp

Shrimad_Mishra · March 29, 2024, 4:31pm

My usecase is that I am sending the logs from one server to another server and pushing it to logstash. But what is happening is that logs are not getting pushed in the order they are means sometimes the logs at 15:21:17 come first than 15:21:15. I know filebeat is multi-threaded but is there any way to solve this?

stephenb · March 30, 2024, 4:23pm

Hi @Shrimad_Mishra

There are a couple of concepts at work... perhaps a little more clarification...

Typically, if you want to guarantee your logs are in order and have the correct @timestamp, you parse and set the originating timestamp from the originating message.

Is that what you are trying to accomplish? or do you have logs with no timestamp and are just "ordered" in a file.

Can you share your filebeat configuration?

Shrimad_Mishra · March 31, 2024, 1:20am

Here is the filebeat conf.

filebeat.inputs:

- type: filestream

  # Change to true to enable this input configuration.
  id: app-log-stream-id
  enabled: true
  take_over: true
  scan_frequency: 1s
  ignore_older: 30m
  clean_inactive: 35m
  close_removed: true
  harvester_limit: 40

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /home/ubuntu/VoiceBot/log/app.log
  fields:
    log_type: app
  max_bytes: 50485760

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml

  reload.enabled: true

  reload.period: 5s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


# ================================== General ===================================

output.logstash:
  # The Logstash hosts
    hosts: ["localhost:5044"]
    ttl: 3600

processors:
  #- add_host_metadata:
      #when.not.contains.tags: forwarded
  #- add_cloud_metadata: ~
  #- add_docker_metadata: ~
  #- add_kubernetes_metadata: ~
  - drop_fields:
      fields: ["input", "ecs", "host", "agent", "log", "@version"]
  - rate_limit:
      limit: "5000/s"
  - timestamp:
      field: event.ingested
      layouts:
        - '2006-01-02T15:04:05.000Z'

logging.level: debug
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0640

My logs are coming in this pattern

[28-Mar-2024 18:48:57] ERROR [consumers_voice_bot_langchain.py:269] Error  
[28-Mar-2024 18:48:56] INFO [utils_synchronous.py:573] Info 1
[28-Mar-2024 18:48:57] INFO [utils_synchronous.py:573] Info 2

But it should come in this

[28-Mar-2024 18:48:56] INFO [utils_synchronous.py:573] Info 1
[28-Mar-2024 18:48:57] ERROR [consumers_voice_bot_langchain.py:269] Error  
[28-Mar-2024 18:48:57] INFO [utils_synchronous.py:573] Info 2

I do not know why it is happening

leandrojmp · March 31, 2024, 3:20am

Neither filebeat nor logstash will preserve the order per default, they are both multi-threaded tools and process things in parallel.

If the order is important you should use the value of the offset field to order your events and also parse the date string from your logs.

Since your filebeat output is logstash you can try to keep the order changing the following settings:

First you need to add pipelining: 0 to your logstash output in filebeat.yml, as explained in the documentation this will make filebeat to not send async batches to logstash, so it will only send a new batch after it receives an ack for the previous batch.

So your output should be something like this:

output.logstash:
  # The Logstash hosts
    hosts: ["localhost:5044"]
    ttl: 3600
    pipelining: 0

After that you also need to change your logstash pipepline to run with only one processor, how you do this depends on how you are running logstash.

If you are running using the command line you need to use the parameter -w 1, if you are running it as a service you need to add pipeline.workers: 1 in your pipeline.yml or logstash.yml depending if you are using multiple pipelines or not.

Keep in mind that this can severely impact on logstash performance as it will make logstash run using just one CPU core.

Shrimad_Mishra · March 31, 2024, 4:07am

Thanks, but how to use offset?

stephenb · March 31, 2024, 6:33am

Is that date /.time part of the actual log line?

If so, the right way to do this is to parse that and set it as the @timestamp then when you look in discover They will be in your proper order.

leandrojmp · March 31, 2024, 6:35am

It is a field in the document named log.offset, you can sort by this field in Discover in Kibana.

Shrimad_Mishra · March 31, 2024, 7:54am

Yes the timestamp is the time of log logged. Can you explain me how to do that.

Shrimad_Mishra · March 31, 2024, 7:54am

Okay but I am using Opensearch so will it be useful?

system · March 31, 2024, 7:54am

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

stephenb · March 31, 2024, 2:23pm

Normally you would use an ingest pipeline to parse the data in elasticsearch and set the proper timestamp... not sure the process in Opensearch.

Or you could dissect in filebeat yml to get that log time field and set the timestamp there..

Shrimad_Mishra · March 31, 2024, 3:16pm

As far as I know this configuration only parse the data into specificed pattern and send it to the configured output

Can you provide configuration for thesame so that I can exactly come to know

dadoonet · March 31, 2024, 3:59pm

You should either switch to Elasticsearch or ask the OpenSearch community about this.

The products have been diverging for a long time now. See more on this:

And

leandrojmp · March 31, 2024, 4:05pm

What does your logstash configuration looks like? Please share it.

Are you parsing your logs in Logstash? If so you need to use a date filter to parse your date string into the @timestamp field.

stephenb · March 31, 2024, 4:05pm

@Shrimad_Mishra

Here is the filebeat way to do it...

You do not have a timezone so you would probably need to add that otherwise it will assume UTC
From the docs

timezone no UTC IANA time zone name (e.g. America/New_York) or fixed time offset (e.g. +0200) to use when parsing times that do not contain a time zone. Local may be specified to use the machine’s local time zone.

processors:
  - drop_fields:
      fields: ["input", "ecs", "host", "agent", "log", "@version"]
  - rate_limit:
      limit: "5000/s"

#[28-Mar-2024 18:48:57] ERROR [consumers_voice_bot_langchain.py:269] Error        
  - dissect:
      tokenizer: "[%{log_time}] %{log.level} [%{log.origin.function}] %{message_detail}"
      field: "message"
      target_prefix: ""
      trim_values: all

  - timestamp:
      field: log_time
      layouts:
        - '02-Jan-2006 15:04:05'
      test:
        - '28-Mar-2024 18:48:57'

  - drop_fields:
      fields: [log_time]

Results in

{
  "_index": ".ds-filebeat-8.12.1-2024.03.21-000001",
  "_id": "McVAlY4B_IpKa2x78IFF",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@timestamp": "2024-03-28T18:48:57.000Z",
    "message": "[28-Mar-2024 18:48:57] ERROR [consumers_voice_bot_langchain.py:269] Error  ",
    "log": {
      "origin": {
        "function": "consumers_voice_bot_langchain.py:269"
      },
      "level": "ERROR"
    },
    "message_detail": "Error"
  }

Ohh yes... Use Elasticsearch

system · April 28, 2024, 6:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ignore_order not working in Filebeat Beats filebeat	4	447	June 30, 2020
Filebeat does not poll files in order Beats filebeat	5	2259	November 21, 2017
How to control the filebeat read file sequence Beats filebeat	8	4382	July 5, 2017
How to send logs to logstash from filebeat in the order they are printed in the log file Beats filebeat	7	5581	December 19, 2018
Set order of filebeat inputs Beats filebeat	2	357	April 12, 2021

Filebeat is not polling the logs in the order of their time stamp

Related topics