Multiple Onigurama named captures with same name only captures once


(Eirik Rye) #1

Hi!

The logstash documentation introduces two ways of creating named captures in Logstash:

  • Onigurama regexp syntax:(?<a>Hello)
  • Grok patterns from pattern file: HELLO Hello and then {%HELLO:a} in the grok pattern

Sometimes creating a pattern is overkill. For my particular application, I would like to capture an exact string in a named capture.

Consider the following input:

Hello World 15 12

And the following patterns file:

HELLO Hello
WORLD World

This example is somewhat contrived, but this first pattern gives the desired result:

%{HELLO:a} %{WORLD:a} %{POSINT:n} %{POSINT:n}

yields

{
    "a": ["Hello","World"],
    "n": ["15", "12"]
}

However, this other pattern using the regexp syntax yields a completely different (and incorrect!) result:

(?<a>Hello) (?<a>World) %{POSINT:n} %{POSINT:n}

yields

{
    "a": ["Hello"],
    "n": ["World", "15"]
}

I would expect the two forms to be completely equivalent and give the exact same result, however it appears using the regex syntax only captures the first definition of the group, and shifts all other matches to the right.

For context, here is the actual pattern i am having trouble with:

(SMTP connection from|(?<close_reason>unexpected) disconnection while reading SMTP command from) %{EXIM_REMOTE_HOST} %{EXIM_INTERFACE} (closed by (?<close_reason>QUIT|EOF|DROP in ACL) | \(TCP/IP connection count = %{POSINT:connection_count}\))?

Patterns:

EXIM_REMOTE_HOST ((H=)?(%{HOSTNAME:remote_hostname} )?(\(%{NOTSPACE:remote_heloname}\) )?\[%{IP:remote_addr}\])(:%{POSINT:remote_port})?
EXIM_INTERFACE I=\[%{IP:interface}\](:%{NUMBER:interface_port})

Basically I want to capture the specific strings in a group close_reason to store in the metadata. However, feeding this example line into it will yield an incorrect result due to the close_reason captures:

SMTP connection from [127.0.0.1]:52168 I=[192.168.1.1]:587 (TCP/IP connection count = 46)

It matches, but connection_count is null instead of "46":

{
  "close_reason": [null],
  "remote_addr": ["127.0.0.1"],
  "remote_port": ["52168"],
  "interface": ["192.168.1.1"],
  "interface_port": ["587"],
  "connection_count": [null]
}

Is this a bug, or am I doing something very wrong?


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.