Filebeat is not polling the logs in the order of their time stamp

My usecase is that I am sending the logs from one server to another server and pushing it to logstash. But what is happening is that logs are not getting pushed in the order they are means sometimes the logs at 15:21:17 come first than 15:21:15. I know filebeat is multi-threaded but is there any way to solve this?

Hi @Shrimad_Mishra

There are a couple of concepts at work... perhaps a little more clarification...

Typically, if you want to guarantee your logs are in order and have the correct @timestamp, you parse and set the originating timestamp from the originating message.

Is that what you are trying to accomplish? or do you have logs with no timestamp and are just "ordered" in a file.

Can you share your filebeat configuration?

Here is the filebeat conf.

filebeat.inputs:

- type: filestream

  # Change to true to enable this input configuration.
  id: app-log-stream-id
  enabled: true
  take_over: true
  scan_frequency: 1s
  ignore_older: 30m
  clean_inactive: 35m
  close_removed: true
  harvester_limit: 40

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /home/ubuntu/VoiceBot/log/app.log
  fields:
    log_type: app
  max_bytes: 50485760

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml

  reload.enabled: true

  reload.period: 5s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


# ================================== General ===================================

output.logstash:
  # The Logstash hosts
    hosts: ["localhost:5044"]
    ttl: 3600

processors:
  #- add_host_metadata:
      #when.not.contains.tags: forwarded
  #- add_cloud_metadata: ~
  #- add_docker_metadata: ~
  #- add_kubernetes_metadata: ~
  - drop_fields:
      fields: ["input", "ecs", "host", "agent", "log", "@version"]
  - rate_limit:
      limit: "5000/s"
  - timestamp:
      field: event.ingested
      layouts:
        - '2006-01-02T15:04:05.000Z'

logging.level: debug
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0640

My logs are coming in this pattern

[28-Mar-2024 18:48:57] ERROR [consumers_voice_bot_langchain.py:269] Error  
[28-Mar-2024 18:48:56] INFO [utils_synchronous.py:573] Info 1
[28-Mar-2024 18:48:57] INFO [utils_synchronous.py:573] Info 2

But it should come in this

[28-Mar-2024 18:48:56] INFO [utils_synchronous.py:573] Info 1
[28-Mar-2024 18:48:57] ERROR [consumers_voice_bot_langchain.py:269] Error  
[28-Mar-2024 18:48:57] INFO [utils_synchronous.py:573] Info 2

I do not know why it is happening

Neither filebeat nor logstash will preserve the order per default, they are both multi-threaded tools and process things in parallel.

If the order is important you should use the value of the offset field to order your events and also parse the date string from your logs.

Since your filebeat output is logstash you can try to keep the order changing the following settings:

First you need to add pipelining: 0 to your logstash output in filebeat.yml, as explained in the documentation this will make filebeat to not send async batches to logstash, so it will only send a new batch after it receives an ack for the previous batch.

So your output should be something like this:

output.logstash:
  # The Logstash hosts
    hosts: ["localhost:5044"]
    ttl: 3600
    pipelining: 0

After that you also need to change your logstash pipepline to run with only one processor, how you do this depends on how you are running logstash.

If you are running using the command line you need to use the parameter -w 1, if you are running it as a service you need to add pipeline.workers: 1 in your pipeline.yml or logstash.yml depending if you are using multiple pipelines or not.

Keep in mind that this can severely impact on logstash performance as it will make logstash run using just one CPU core.

Thanks, but how to use offset?

Is that date /.time part of the actual log line?

If so, the right way to do this is to parse that and set it as the @timestamp then when you look in discover They will be in your proper order.

It is a field in the document named log.offset, you can sort by this field in Discover in Kibana.

Yes the timestamp is the time of log logged. Can you explain me how to do that.

Okay but I am using Opensearch so will it be useful?

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

Normally you would use an ingest pipeline to parse the data in elasticsearch and set the proper timestamp... not sure the process in Opensearch.

Or you could dissect in filebeat yml to get that log time field and set the timestamp there..

As far as I know this configuration only parse the data into specificed pattern and send it to the configured output

Can you provide configuration for thesame so that I can exactly come to know

You should either switch to Elasticsearch or ask the OpenSearch community about this.

The products have been diverging for a long time now. See more on this:

And

1 Like

What does your logstash configuration looks like? Please share it.

Are you parsing your logs in Logstash? If so you need to use a date filter to parse your date string into the @timestamp field.

@Shrimad_Mishra

Here is the filebeat way to do it...

You do not have a timezone so you would probably need to add that otherwise it will assume UTC
From the docs

timezone no UTC IANA time zone name (e.g. America/New_York) or fixed time offset (e.g. +0200) to use when parsing times that do not contain a time zone. Local may be specified to use the machine’s local time zone.

processors:
  - drop_fields:
      fields: ["input", "ecs", "host", "agent", "log", "@version"]
  - rate_limit:
      limit: "5000/s"

#[28-Mar-2024 18:48:57] ERROR [consumers_voice_bot_langchain.py:269] Error        
  - dissect:
      tokenizer: "[%{log_time}] %{log.level} [%{log.origin.function}] %{message_detail}"
      field: "message"
      target_prefix: ""
      trim_values: all

  - timestamp:
      field: log_time
      layouts:
        - '02-Jan-2006 15:04:05'
      test:
        - '28-Mar-2024 18:48:57'

  - drop_fields:
      fields: [log_time]      

Results in

{
  "_index": ".ds-filebeat-8.12.1-2024.03.21-000001",
  "_id": "McVAlY4B_IpKa2x78IFF",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@timestamp": "2024-03-28T18:48:57.000Z",
    "message": "[28-Mar-2024 18:48:57] ERROR [consumers_voice_bot_langchain.py:269] Error  ",
    "log": {
      "origin": {
        "function": "consumers_voice_bot_langchain.py:269"
      },
      "level": "ERROR"
    },
    "message_detail": "Error"
  }

Ohh yes... Use Elasticsearch :slight_smile: