I am pulling events from an Azure Event Hub, but some of the events are being grouped into a single message containing an array of "records", which I want to be processed as individual messages. The format is:
Adding the split filter, the records array is correctly split into two separate events, but each member the array is added to a new field called 'records', rather than replacing the contents of the message attribute (which is left as is), e.g.
This is the expected behavior, the split filter per default will use the same field name.
Another thing is that the message field is kind of a special field name, this is the field that have the original message that logstash received, in version 8 with ecs compatibility enabled this field is renamed to event.original.
For example, your original message is something like this:
To parse this message you would need to have a json filter with the message field as a source.
json {
source => "message"
}
This will parse the content of the message field and put them on the root of the document as no target was specified, the message field will not be changed or removed unless you explicitly remove it.
To arrive on the output example you give your pipeline filter block should look like this:
If you want to have the content of the records field inside the message field, like message: {"attribute": "value1"}, you need to remove the original message field and rename the records field before the split.
Thanks for the explanation. Is it not possible then to maintain the split 'records' array in the message field, thus retaining the original structure, i.e.
The records won't be an array of a single item, but this doesn't matter as there is no dedicate array data type, so this makes no difference to the mapping.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.