Understanding dissect behaviour

I have the following filter.

filter {
       dissect {
               mapping => {
                       "message" => "%{msg} src=%{host->}:%{port} %{abc}"
               }
       }
}

This throws a _dissectfailure when the message is

asd src=192.168:500 dfg

but when I change the message to

asd src=192.168:500:500 dfg

the parsing works fine. The output however is very confusing.

{"message":"asd src=192.168:500:500 dfg","host":"192.168","@timestamp":"2019-08-21T19:20:34.530Z","abc":"dfg","msg":"asd","@version":"1","port":"500:500"}

If the port is the part after the ':' then the previous string also should be matched fine. What am I missing ?

I would argue that both of those are bugs. You ought to be able to use the same dissect to parse

wordFollowedBySingleSpace restOfLine
shortWord                 anotherline

but

    dissect { mapping => { "message" => "%{first->} %{second}" } }

fails to dissect the first. And I believe -> was meant to handle multiple consecutive delimiters, not to skip over words to get to the next instance of the delimiter.

@guyboertje, any thoughts?

@Badger

Apologies, I've been off for 8 days, minor surgery, all ok now.

I don't think this is a bug.

The original problem posted though does have a problem. The repeated delimiters does not work as @amitavmohanty01 expects.
The dissect filter is quite simplistic (hence the speed) and very strictly left to right processing.
It is best to think of the dissection (the value bit of the mapping key value pair) as a stack of operations. I'll try to ascii art what I mean. For this dissection,
'%{year}/%{month}/%{day}T%{hour}:%{minute}:%{second}

<find a '/'>
<find a '/'>
<find a 'T'>
<find a ':'>
<find a ':'>

For this string 1999/12/31T23:59:59...
Dissect starts at index 0 of the byte array (multi-byte UTF-8 characters are found as a group of bytes, i.e. one character might be 2,3 or 4 bytes)
It pops the first operation off of the stack, executes it and moves the index to the position of the byte following the / (5). It updates the operation data structure with the start (0) and offset (4)
It pops the second operation off of the stack and executes it, but from the index point (5). It updates the operation data structure with the start (5) and offset (2).
It pops the third operation off of the stack and executes it, but from the index point (7). It updates the operation data structure with the start (7) and offset (2).
And so on until it reaches the end of the byte array or the stack is empty.
It never goes back.

In the OP case, "%{msg} src=%{host->}:%{port} %{abc}", once the : delimiter is found, that operation is not retried hence the value of 500:500. The %{host->}: operation is looking for one or more : not one or more <data>:

Currently, one will have to use a second dissect on the port field's value but in a conditional and without the -> consecutive directive.

filter {
  dissect {
    mapping => {
      "message" => "%{msg} src=%{host}:%{port} %{abc}"
    }
  }
  if [port] =~ /\d+\:\d+/ {
    dissect {
      mapping => {
        "port" => "%{}:%{port}"
      }
    }
  }
}

BTW, this config on LS 7.3.0 does not fail.

input {
  generator {
    lines => [
      "LinePrinter O isUserPresentAndValid:   false",
      "SystemOut     O isUserPresentAndValid:   true"
      ]
    count => 1
  }
}

filter {
  dissect {
    mapping => {
      message => '%{channel->} %{zero} %{msg}'
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

Gives

{
      "@version" => "1",
      "sequence" => 0,
    "@timestamp" => 2019-08-28T10:52:05.496Z,
       "message" => "SystemOut     O isUserPresentAndValid:   true",
          "host" => "Elastics-MacBook-Pro.local",
           "msg" => "isUserPresentAndValid:   true",
       "channel" => "SystemOut",
          "zero" => "O"
}
[2019-08-28T11:52:05,928][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
{
      "@version" => "1",
      "sequence" => 0,
    "@timestamp" => 2019-08-28T10:52:05.480Z,
       "message" => "LinePrinter O isUserPresentAndValid:   false",
          "host" => "Elastics-MacBook-Pro.local",
           "msg" => "isUserPresentAndValid:   false",
       "channel" => "LinePrinter",
          "zero" => "O"
}

HOWEVER...
If I change the space as a delimiter into, say, a hyphen, it fails, hmmm? That said, most padded type logs use a space for better visual alignment. However, there are certainly some cases where a non space delimiter will need be consumed greedily.

I'll create a bug issue in the repo.

I agree that this works. However

input {
    generator {
        lines => [
            "LinePrinter O",
            "SystemOut     O"
        ]
        count => 1
    }
}

filter {
    dissect { mapping => { message => '%{channel->} %{zero}' }
}

gets a _dissectfailure for the first line (in 7.3.1). That seems wrong to me.

Good information, thanks. I'll add that to the issue.

@Badger

I found the bug. It manifests only when using two fields and one delimiter and -> is used on the first field.
There is a boolean flag that tracks when a delimiter is not found. Also. the last field with the previous delimiter is handled last outside of the other fields loop. The other fields loop bails out early if the flag is true. The bug is that the flag is being set to true when the additional delimiter bytes (the second, third etc space or hyphen) were not found. The flag should not be set when trying to move beyond the "padding" and there is no padding.

Created an issue

I am guessing the important thing here is one occurence of a delimiter.

Parsing "a b c d" using "%{f1} %{f2->} %{f3} %{f4}" works

Parsing "a b=c d" using "%{f1} %{f2->}=%{f3} %{f4}" fails (this was the OPs issue)

Will test that last one with my local fixed code. I'm not 100% sure why the space char works but the = does not.

EDIT

Tested. Local fix test passed.