I want to split a field into an array, but I want the array values to keep the separator characters. It seems that the way mutate split works is to remove the separator characters.
Since the mutate filter applies gsub directives before split directives, it is possible to use a positive-lookbehind assertion to inject a character on which we can later split:
pattern: "(?<=\)), " a comma-space sequence that is preceeded by a literal closing paren
replacement: "|" a pipe character (whatever sequence you use MUST NOT appear naturally in your messages)
filter {
mutate {
# replace any comma-space that is preceeded by a closing paren with a pipe
gsub => ["message", "(?<=\)), ", "|"]
# split on the pipe
split => { "message" => "|" }
}
}
With the input you gave as the message on Logstash 6.2.2, the above filter gave me output that is likely what you expect:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.