Hey, Elastic pioneer here, testing 5.0alpha1
In the ingest node, I was able to define a grok processor, but some log types need multiple grok rules (e.g. firewall logs). In Logstash, you'd simply specify an array of match definitions. If one matches, it the line is parsed, for example:
grok {
match => [
"cisco_message", "%{CISCOFW106001}",
"cisco_message", "%{CISCOFW106006_106007_106010}",
"cisco_message", "%{CISCOFW106014}"
...
For the ingest node, if a grok processor fails, it throws and exception. So the only way I could find to add in multiple rules is to chain multiple such processors via on_failure
. If there are multiple rules, the pipeline definition gets pretty hairy:
"grok": {
"field": "cisco_message",
"pattern": "%{CISCOFW106001}",
"on_failure": [
{
"grok": {
"field": "cisco_message",
"pattern": "%{CISCOFW106006_106007_106010}",
"on_failure": [
{
"grok": {
"field": "cisco_message",
"pattern": "%{CISCOFW106014}"
...
Would a pattern
array make sense, to emulate Logstash's behavior? Should I open an issue on GitHub or is there a better way to handle this already?
Also, I'm not sure how Logstash implements this but performance degrades quite nicely with multiple rules. In this particular case, I've seen 1.5x slower throughput with 23 rules compared to one rule. With the ingest node and the on_failure
approach described here, I'm getting 9x slower throughput with 23 rules. That said, Ingest node is faster in both cases, so maybe Logstash behaves better proportionally because it's heavier to begin with.
Best regards,
Radu