How can I create custom tags from the data inside a log file using filebeat

I am using elk stack for a month now. Lately, I have come across a situation where I need some help from the community to save me some time. I am actually running filebeat and elk stack in docker. I am sending logs from filebeat to logstash. The issue is my log is a custom log file and I want to create tags out of the data inside log file. Here is a line from my log file.

$(get_timestamp) $tag_name $tag_machine_name $site_id $parking_lot_id $log_level message
2023-01-04 10:23:10 imageProcessing m00001 s00001 p00001 INFO Starting Script imageProcessing.sh

The first line above shows the tags that I want to create and second line is just an example how my log file is logging everything to the file.
What I have seen so far is that we can use processors to filter out data coming from the log file using add_field. But the issue is I have not come across a better solution where it says that if the log file has a word like "m00001" tag it as a machine id and for other words too.

What I can see now in kibana is that it is logging everything as a single message. like i see log.message = all the data from the second line.

Any help would be appreciated.

Thanks

Hi @Raja_Muneer

You have many options to parse the data and the apply filters / add fields etc.

  1. You could dissect it in filebeat

  2. You could dissect or grok it in logstash

  3. You could dissect it or grok in an ingest pipeline

It kind of depends on what you want to do.

@stephenb thank you Stephan for replying. Can you provide any example if possible as it will help me immensly.

I want to do the processing thing on filebeat only and want to use those tags in logstash to create index pattern. Or maybe I am not sure if I can create index pattern inside filebeat

Then I would probably try the dissect processor in filebeat.

You can create different index patterns in filebeat but in my opinion, it is easier to do in logstash.

You could certainly parse that in logstash and keep your logic in one place, that is a very common pattern, but it should work in filebeat as well.

Here is a quick example

# filestream is an input for collecting log messages from files.
- type: filestream

  # Unique ID among all inputs, an ID is required.
  id: my-filestream-id

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /Users/sbrown/workspace/sample-data/discuss/filebeat-dissect.log
    #- c:\programdata\elasticsearch\logs\*
  processors:
    - dissect:
        tokenizer: "%{date} %{time} %{tag_name} %{tag_machine_name} %{site_id} %{parking_lot_id} %{log_level} %{message}"
        field: "message"
        target_prefix: "dissect"

That turned your logs line into

      "_source": {
          "@timestamp": "2023-01-21T18:29:45.263Z",
          "agent": {
            "id": "ee8a8567-dd3d-45bc-a871-b3a79e0953e7",
            "name": "hyperion",
            "type": "filebeat",
            "version": "8.5.3",
            "ephemeral_id": "4ac2eca9-afdb-4175-b037-442404f57d3a"
          },
          "ecs": {
            "version": "8.0.0"
          },
...
          "log": {
            "offset": 0,
            "file": {
              "path": "/Users/sbrown/workspace/sample-data/discuss/filebeat-dissect.log"
            }
          },
          "message": "2023-01-04 10:23:10 imageProcessing m00001 s00001 p00001 INFO Starting Script imageProcessing.sh",
          "input": {
            "type": "filestream"
          },
          "dissect": {
            "tag_machine_name": "m00001",
            "site_id": "s00001",
            "parking_lot_id": "p00001",
            "log_level": "INFO",
            "message": "Starting Script imageProcessing.sh",
            "date": "2023-01-04",
            "time": "10:23:10",
            "tag_name": "imageProcessing"
          }
        }
      }
1 Like

@stephenb thanks for your valueable reply and sorry for the late response. One last small detail I would love to know. What if I have multiple log files and I want to create tags out of each message. Can I use the single processor for that purpose or Do I need different processors for each log file. Furthermore, thanks to you I can see that it has created tags as per your answer provided above. How can I create an index based on the output coming from filebeat. e.g I want to create an index out of machinename in logstash as I will have multiple machines sending logs to my logstash so having an index based on machine name will hugely benefit me.

Also when I try to filter them out in kibana. It does not highlights the field I putted in the search field. Usually with when we filter out anything it highlights incase of disect tags it is not hightlighting. Any thoughts on this?

Hi @Raja_Muneer you should really open separate threads for these questions I will give a quick response here:

If they have different log patterns you will probably need to use different inputs each with the appropriate dissect pattern.

What version are you on?

Apologies do you want to create an index name in filebeat or logstash the sentence above says both but let's assume logstash

In logstash the output would be some this like this

An index per machine is typically an anti-pattern, if you have many machines you will have many indices which "look" nice to your eyes but can be very inefficient... typically the pattern is to send to a common index and filter on the Kibana Side. It is unclear what the huge benefit is, this is just a suggestion.

If you only have a few machines then fine, if you have a lot / many machines...not so fine.

When you filter out the docs are filtered out so there is no highlighting ... if you mean filter in.. then please provide a the Version you are on and a screen shot.

Please open new questions in new thread.

Thank you so much @stephenb. You have no doubt cleared my understanding of these questions. About the filter part I am sorry I cannot provide a screenshot at the moment but what I meant by that was the highlighting. i.e for instance when we receive logs from a beat we can see them on kibana suppose I am interested in watching messages from a log. So when I query using tags like a log.message="Warn" and when I click the update button it highlights all those messages in yellow. but as we created dissect tags when I filter them using a tag like dissect.machine_name="m00001" it doesn't highlights them in yellow. BTW I am on version 8.6.0

I really appreciate your patience and thanks for helping me out.

In KQL bar ... Make sure to use KQL

dissect.machine_name: "m00001"

There is no = in KQL

This is 8.6.0 Highlight works fine

1 Like

Thank you so much @stephenb