Splitting Logstash message

I am pulling events from an Azure Event Hub, but some of the events are being grouped into a single message containing an array of "records", which I want to be processed as individual messages. The format is:

{
timestamp,
message
  {
  records: [
    {event1},
    {event2}
  ]
  }
}

I've used the split filter to split the events into separate messages, but this strips the 'timestamp' and 'message' fields, i.e.

{
record
  {
  event1
  }
},
{
record
  {
  event2
  }
}

I'd like to preserve the message field of each message, i.e.

{
timestamp,
message
  {
  records: [
    {event1}
  ]
  }
},
{
timestamp,
message
  {
  records: [
    {event2}
  ]
  }
}

The filter I currently have is:

filter {
  json {
    source => "message"
  }
  split {
    field => ["records"]
    remove_field => ["message"]
  }
  mutate {
        add_field => {"@timestamp" => "%{@timestamp}"}
  }
}

But how can I split the records whilst retaining the message structure?

The strip filter will create a new event for every item in the array, but will keep every other field in the event, except the one being splitted.

It wont remove the timestamp field in the example you shared. Can you share an output where the timestamp field is removed after the split?

I assume you meant 'split', not 'strip'.

Before using split, the 'records' array is contained in the message field, e.g.

{"@timestamp":"2023-10-23T10:17:25.436Z","@version":"1","message":"{"records": [{ "attribute": "value1"},{"attribute": "value2"}]}

Adding the split filter, the records array is correctly split into two separate events, but each member the array is added to a new field called 'records', rather than replacing the contents of the message attribute (which is left as is), e.g.

{"@timestamp":"2023-10-23T10:42:48.294Z","@version":"1","records":{"attribute":"value1"},"message":"{"records": [{"attribute": "value1"},{"attribute": "value2"}]}

{"@timestamp":"2023-10-23T10:42:48.294Z","@version":"1","records":{"attribute":"value2"},"message":"{"records": [{"attribute": "value1"},{"attribute": "value2"}]}

I am hoping to preserve the event structure, whereby the single value record array is contained in the message field.

On closer inspection, the timestamp field is not removed - it was just moved to the end of the event when I deleted the message field.

Yeah, I meant split.

This is the expected behavior, the split filter per default will use the same field name.

Another thing is that the message field is kind of a special field name, this is the field that have the original message that logstash received, in version 8 with ecs compatibility enabled this field is renamed to event.original.

For example, your original message is something like this:

{"records": [{ "attribute": "value1"},{"attribute": "value2"}]}

So when this message enters your logstash pipeline, behind the scenes you will have something like this:

{ "message": {"records": [{ "attribute": "value1"},{"attribute": "value2"}]} }

To parse this message you would need to have a json filter with the message field as a source.

json {
    source => "message"
}

This will parse the content of the message field and put them on the root of the document as no target was specified, the message field will not be changed or removed unless you explicitly remove it.

To arrive on the output example you give your pipeline filter block should look like this:

filter {
    json {
        source => "message"
    }
    split {
        field => ["[records]"]
    }
}

If you want to have the content of the records field inside the message field, like message: {"attribute": "value1"}, you need to remove the original message field and rename the records field before the split.

The following fitlers will give that:

filter {
    json {
        source => "message"
        remove_field => ["message"]
    }
    mutate {
        rename => {
            "records" => "message"
        }
    }
    split {
        field => ["[message]"]
    }
}

The result of this would be something like this:

{"host":"lab","message":{"attribute":"value1"},"@version":"1","@timestamp":"2023-10-24T11:58:43.867652396Z"}
{"host":"lab","message":{"attribute":"value2"},"@version":"1","@timestamp":"2023-10-24T11:58:43.867652396Z"}

Thanks for the explanation. Is it not possible then to maintain the split 'records' array in the message field, thus retaining the original structure, i.e.

{"@timestamp":"2023-10-23T10:17:25.436Z","@version":"1","message":"{"records": [{ "attribute": "value1"}]}

{"@timestamp":"2023-10-23T10:19:22.726Z","@version":"1","message":"{"records": [{ "attribute": "value2"}]}

Ideally I'm trying to split the records 'array' without updating the Elastic mappings/indexes.

It is, just rename the field to [message][records] instead of just message.

    mutate {
        rename => {
            "records" => "[message][records]"
        }
    }

Then in your output you will have this:

{"host":"lab","@version":"1","@timestamp":"2023-10-24T15:33:11.511695899Z","message":{"records":{"attribute":"value1"}}}
{"host":"lab","@version":"1","@timestamp":"2023-10-24T15:33:11.511695899Z","message":{"records":{"attribute":"value2"}}}

The records won't be an array of a single item, but this doesn't matter as there is no dedicate array data type, so this makes no difference to the mapping.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.