For the last 6 weeks or so I have been developing Logstash filters for the various logs our network produces, and I have been on a steep learning curve. It's possible that I am simply ignorant of some basic fact, but I've looked at various examples online and I've seen for myself that if a field name is repeated, Logstash ends up making that field into an array and stores all of the values captured to that fieldname in that array.
I have logs produced by AMaViS which have some fields which may have one or sometimes more than one value, usually separated by commas, sometimes comma-space. So I'd like to use a regular expression (snippet) such as:
(?:%{WORD:data}, )*%{WORD:data}
or its reverse:
%{WORD:data}(?:, %{WORD:data})*
don't seem to work at all.
These patterns work in the Grok debugger but they do not function at all inside Logstash. When I put these patterns into the logstash config, all the entries which ought to match instead get marked as _grokparsefailure
In the rare occasion that something like
(?:%{WORD:data},)+
will work, it seems to work as expected.
That seems mysterious to me. Can anyone shed any light on why patterns using * should fail when a similar pattern using + succeeds?
It's clear to me that I can just capture the list portions and use kv
or probably in my case Ruby to parse them into the lists that I want. But it seems like the grok parser should handle this, and that would certainly be simpler.
I am new to the forum and I'm not presently certain how to attach files, so I will look into that, in order to provide a concrete example. (It looks like attachments aren't supported.)