Splitting Logstash message

joecarter · October 23, 2023, 8:31pm

I am pulling events from an Azure Event Hub, but some of the events are being grouped into a single message containing an array of "records", which I want to be processed as individual messages. The format is:

{
timestamp,
message
  {
  records: [
    {event1},
    {event2}
  ]
  }
}

I've used the split filter to split the events into separate messages, but this strips the 'timestamp' and 'message' fields, i.e.

{
record
  {
  event1
  }
},
{
record
  {
  event2
  }
}

I'd like to preserve the message field of each message, i.e.

{
timestamp,
message
  {
  records: [
    {event1}
  ]
  }
},
{
timestamp,
message
  {
  records: [
    {event2}
  ]
  }
}

The filter I currently have is:

filter {
  json {
    source => "message"
  }
  split {
    field => ["records"]
    remove_field => ["message"]
  }
  mutate {
        add_field => {"@timestamp" => "%{@timestamp}"}
  }
}

But how can I split the records whilst retaining the message structure?

leandrojmp · October 23, 2023, 8:39pm

The strip filter will create a new event for every item in the array, but will keep every other field in the event, except the one being splitted.

It wont remove the timestamp field in the example you shared. Can you share an output where the timestamp field is removed after the split?

joecarter · October 24, 2023, 9:32am

I assume you meant 'split', not 'strip'.

Before using split, the 'records' array is contained in the message field, e.g.

{"@timestamp":"2023-10-23T10:17:25.436Z","@version":"1","message":"{"records": [{ "attribute": "value1"},{"attribute": "value2"}]}

Adding the split filter, the records array is correctly split into two separate events, but each member the array is added to a new field called 'records', rather than replacing the contents of the message attribute (which is left as is), e.g.

{"@timestamp":"2023-10-23T10:42:48.294Z","@version":"1","records":{"attribute":"value1"},"message":"{"records": [{"attribute": "value1"},{"attribute": "value2"}]}

{"@timestamp":"2023-10-23T10:42:48.294Z","@version":"1","records":{"attribute":"value2"},"message":"{"records": [{"attribute": "value1"},{"attribute": "value2"}]}

I am hoping to preserve the event structure, whereby the single value record array is contained in the message field.

On closer inspection, the timestamp field is not removed - it was just moved to the end of the event when I deleted the message field.

leandrojmp · October 24, 2023, 11:59am

Yeah, I meant split.

This is the expected behavior, the split filter per default will use the same field name.

Another thing is that the message field is kind of a special field name, this is the field that have the original message that logstash received, in version 8 with ecs compatibility enabled this field is renamed to event.original.

For example, your original message is something like this:

{"records": [{ "attribute": "value1"},{"attribute": "value2"}]}

So when this message enters your logstash pipeline, behind the scenes you will have something like this:

{ "message": {"records": [{ "attribute": "value1"},{"attribute": "value2"}]} }

To parse this message you would need to have a json filter with the message field as a source.

json {
    source => "message"
}

This will parse the content of the message field and put them on the root of the document as no target was specified, the message field will not be changed or removed unless you explicitly remove it.

To arrive on the output example you give your pipeline filter block should look like this:

filter {
    json {
        source => "message"
    }
    split {
        field => ["[records]"]
    }
}

If you want to have the content of the records field inside the message field, like message: {"attribute": "value1"}, you need to remove the original message field and rename the records field before the split.

The following fitlers will give that:

filter {
    json {
        source => "message"
        remove_field => ["message"]
    }
    mutate {
        rename => {
            "records" => "message"
        }
    }
    split {
        field => ["[message]"]
    }
}

The result of this would be something like this:

{"host":"lab","message":{"attribute":"value1"},"@version":"1","@timestamp":"2023-10-24T11:58:43.867652396Z"}
{"host":"lab","message":{"attribute":"value2"},"@version":"1","@timestamp":"2023-10-24T11:58:43.867652396Z"}

joecarter · October 24, 2023, 3:24pm

Thanks for the explanation. Is it not possible then to maintain the split 'records' array in the message field, thus retaining the original structure, i.e.

{"@timestamp":"2023-10-23T10:17:25.436Z","@version":"1","message":"{"records": [{ "attribute": "value1"}]}

{"@timestamp":"2023-10-23T10:19:22.726Z","@version":"1","message":"{"records": [{ "attribute": "value2"}]}

Ideally I'm trying to split the records 'array' without updating the Elastic mappings/indexes.

leandrojmp · October 24, 2023, 3:37pm

It is, just rename the field to [message][records] instead of just message.

    mutate {
        rename => {
            "records" => "[message][records]"
        }
    }

Then in your output you will have this:

{"host":"lab","@version":"1","@timestamp":"2023-10-24T15:33:11.511695899Z","message":{"records":{"attribute":"value1"}}}
{"host":"lab","@version":"1","@timestamp":"2023-10-24T15:33:11.511695899Z","message":{"records":{"attribute":"value2"}}}

The records won't be an array of a single item, but this doesn't matter as there is no dedicate array data type, so this makes no difference to the mapping.

system · November 21, 2023, 3:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Handling events after using split Logstash	3	1012	July 6, 2020
Need help with splitting a log using logstash split filter Logstash	5	358	July 13, 2021
Logstash Split input and save new event.original Logstash	1	416	July 25, 2022
Logstash - split array into individual events Logstash	7	6363	June 16, 2019
How to split array field in json and send it as separate event Logstash	3	334	October 3, 2019

Splitting Logstash message

Related topics