Logstash not able to parse logs with spaces between key value pair in json object

I have a log containing json object. The log gets parsed if json object has no spaces. If it has spaces between key value pair, it is not getting parsed.

Configuration file used

input {
syslog {
port => 3011
}
}

filter {
grok {
match => { "message" =>
[
"%{SYSLOGTIMESTAMP:timestamp4} %{DATA:time_ms}|%{DATA:field1}|%{DATA:field2}|99|%{DATA:field3}|%{DATA:field4}|%{DATA:field5}|%{DATA:field6}|%{DATA:field7}|%{DATA:field8}|%{DATA:field9}|%{DATA:field10}|%{DATA:field11}|%{DATA:field12}|%{GREEDYDATA:field13}"
]
}
}
date {
match => ["timestamp4", "MMM dd HH:mm:ss"]
}
if [field13] {
mutate {
add_field => {"log_type" => "my-logs"}
}
}
}

output {
if [log_type] == "my-logs" {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["ES_HOST:9200"]
index => "my-logs-000001"
}
}
}

Logs getting parsed:
echo "Mar 21 13:27:11 11:11.366293|dataadwhw1|ebsmp4713user5_@maiator|99|4064|22|SUCCESS|data|19|UA101|10.1.1.70|https|data.com|{"wrg_id":"200000337"}|200" | nc localhost 3011

echo "Mar 21 13:27:11 11:11.366293|dataadwhw1|ebsmp4713user5_@maiator|99|4064|22|SUCCESS|data|19|UA101|10.1.1.70|https|data.com|{"wrg_id" :"200000337"}|200" | nc localhost 3011

Log not getting parsed:
echo "Mar 21 13:27:11 11:11.366293|dataadwhw1|ebsmp4713user5_@maiator|99|4064|22|SUCCESS|data|19|UA101|10.1.1.70|https|data.com|{"wrg_id": "200000337"}|200" | nc localhost 3011

Any workaround to parse this log?

    input { generator { count => 1 lines => [
 'Mar 21 13:27:11 11:11.366293|dataadwhw1|ebsmp4713user5_@maiator|99|4064|22|SUCCESS|data|19|UA101|10.1.1.70|https|data.com|{"wrg_id":"200000337"}|200',
 'Mar 21 13:27:11 11:11.366293|dataadwhw1|ebsmp4713user5_@maiator|99|4064|22|SUCCESS|data|19|UA101|10.1.1.70|https|data.com|{"wrg_id" :"200000337"}|200',
 'Mar 21 13:27:11 11:11.366293|dataadwhw1|ebsmp4713user5_@maiator|99|4064|22|SUCCESS|data|19|UA101|10.1.1.70|https|data.com|{"wrg_id": "200000337"}|200'
] } }
output { stdout { codec => rubydebug { metadata => false } } }
filter {
    mutate { remove_field => [ "event", "host", "log" ] }

    grok { match => { "message" => [ "%{SYSLOGTIMESTAMP:timestamp4} %{DATA:time_ms}\|%{DATA:field1}\|%{DATA:field2}\|99\|%{DATA:field3}\|%{DATA:field4}\|%{DATA:field5}\|%{DATA:field6}\|%{DATA:field7}\|%{DATA:field8}\|%{DATA:field9}\|%{DATA:field10}\|%{DATA:field11}\|%{DATA:field12}\|%{GREEDYDATA:field13}" ] } }
    date { match => ["timestamp4", "MMM dd HH:mm:ss"] }
    if [field13] { mutate { add_field => {"log_type" => "my-logs"} } }
}

parses all three lines. You need to give us a reproducible example of what fails.

Your logs have always this format? This can be parsed with the csv filter using the | as the separator, no need to use grok.

1 Like

These 3 logs do get parsed when I use grok debugger tool in elastic. But when I use it in a setup where the logstash is sending logs to kibana, the 3rd log doesn't reach to kibana for some reason. And even no error is being logged in the logs.

I am not using csv because timestamp4 and time_ms field are not separated by a delimiter.

This is not an issue you can combine a grok filter to extract this part and then a csv filter.

Your main issue with the csv filter would be the fact that you have unquoted strings and quoted strings in the same line, to make this work you would need to trick the csv filter to not consider double quotes as quotes, this can be easily done with the option quote_char.

The following pipeline will parse your messages:

filter {
    grok {
        match => {
            "message" => "%{SYSLOGTIMESTAMP:timestamp4} %{DATA:time_ms}\|%{GREEDYDATA:csv_message}"
        }
        remove_field => ["message"]
    }
    csv {
        source => "csv_message"
        separator => "|"
        columns => ["[field1]","[field2]","[@metadata][not_used]","[field3]","[field4]","[field5]","[field6]","[field7]","[field8]","[field9]","[field10]","[field11]","[field12]",,"[field13]"]
        quote_char => "'"
        skip_empty_columns => true
        remove_field => ["csv_message"]
    }
    date {
        match => ["timestamp4", "MMM dd HH:mm:ss"]
    }
    if [field13] {
        mutate {
            add_field => {
                "log_type" => "my-logs"
            }
        }
    }
}

The result is something like this:

{
       "field11" => "data.com",
        "field6" => "data",
       "time_ms" => "11:11.366293",
        "field7" => "19",
        "field2" => "ebsmp4713user5_@maiator",
        "field5" => "SUCCESS",
        "field4" => "22",
        "field9" => "10.1.1.70",
       "field13" => "200",
        "field3" => "4064",
      "log_type" => "my-logs",
        "field8" => "UA101",
    "@timestamp" => 2024-03-21T16:27:11.000Z,
    "timestamp4" => "Mar 21 13:27:11",
        "field1" => "dataadwhw1",
       "field10" => "https",
       "field12" => "{\"wrg_id\": \"200000337\"}"
}

Thanks for the workaround. But still field12 only gets parsed if it is of the form {"wrg_id":"200000337"} or {"wrg_id" :"200000337"}. But the whole log remains unparsed if field12 is {"wrg_id": "200000337"}.

Also not able to figure out much from the logs too. Just getting connection closed message.
[logstash.inputs.syslog ][main][d7e74be29670dab531986f0a3c5e7079c4452996e98bbec29bc3c11efe6f59c1] connection closed {:client=>"0:0:0:0:0:0:0:1:49428"}

Not sure what you mean with this, the values you mentioned are all the same. Please share some example message where the logs is not working and what is the output you are getting.

The values are not exactly same. The two logs which are getting parsed have either no space between the key and value. [Log1 => field12 is {"wrg_id":"200000337"}] or there is space between key and value before colon. [Log2 => field12 is {"wrg_id" :"200000337"}]
The log which is not getting parsed has space between key and value after colon. [Log3 => field12 is {"wrg_id": "200000337"}]

This makes no difference if you are using the filter that I shared, the csv filter will parse de csv message, the value you have between the separators makes no different for the csv filter.

Just tested it here putting a lot of spaces and got this output:

{
        "field2" => "ebsmp4713user5_@maiator",
        "field8" => "UA101",
        "field9" => "10.1.1.70",
      "log_type" => "my-logs",
       "field11" => "data.com",
       "field13" => "200",
        "field3" => "4064",
        "field5" => "SUCCESS",
        "field6" => "data",
        "field1" => "dataadwhw1",
        "field4" => "22",
    "timestamp4" => "Mar 21 13:27:11",
       "time_ms" => "11:11.366293",
       "field10" => "https",
    "@timestamp" => 2024-03-21T16:27:11.000Z,
        "field7" => "19",
       "field12" => "{\"wrg_id\"                  :                    \"200000337\"}"
}

So it is not clear what is your issue, you need to share the message that is failing and what is the output you are getting.

Not sure but below log is getting parsed in my setup.
echo "Mar 21 13:27:11 11:11.366293|dataadwhw1|ebsmp4713user5_@maiator|99|4064|22|SUCCESS|data|19|UA101|10.1.1.70|https|data.com|{"wrg_id": "200000337"}|200" | nc localhost 3011

@leandrojmp Can you let me know which version of elastic and logstash are you using? And whether you are using docker to setup ELK or binary files for installation?