Parse Logstash message field into multiple field

Hello all,

I'm firstly using log stash and got some issue when parsing the JSON log on the message field into multiple field for visualizing in the Kibana dashboard.

Below is my message log appear in Elasticsearch

message
{ "requestId":"0ea45178-0134-4728-b7a3-63d5383a460e", "ip": "10.52.63.99", "responseStatus":"200", "xrayTraceId":"""","requestTime":"14/Aug/2021:05:42:49 +0000", "httpMethod":"POST","resourcePath":"/V2/device-services/deviceapi/login", "stage": "V2", "protocol":"HTTP/1.1", "responseLength":"64", "headers": "",  "deviceId": "" }

And there is my logstash.conf:

input {
    cloudwatch_logs {
        log_group => "API-Gateway-AccessLog"
        region => "ap-south-1"
        type => "apiaccesslog"
        start_position => beginning
        codec => "json"
    }
}

filter {
    json {
        source => "message"       
  }    
}

output {
   stdout {
    codec => rubydebug { metadata => true }
}

   amazon_es {
    hosts => "https://search-test--xyz.us-east-1.es.amazonaws.com"
   index => "api-accesslog-"
  }
}

Please help me to solve the issue, thank you in advance.

What issue?

You should not need a json filter if you are parsing the messages with a json codec. Use one or the other.

Hello @Badger,

Here is my issue, output in ES does not have some field I need such as: IP, Method, responseStatus, ...:

Here is the JSON output in ES:

{
    "_index": "api-accesslog-2021.08",
    "_type": "_doc",
    "_id": "lQYrRXsBtbGHaeQvvjkp",
    "_version": 1,
    "_score": null,
    "_source": {
      "tags": [
        "_jsonparsefailure"
      ],
      "@version": "1",
      "message": "{ \"requestId\":\"4fe4588f-3ff7-4482-8cd8-e715b5731104\", \"ip\": \"42.106.7.94\", \"responseStatus\":\"200\", \"xrayTraceId\":\"-\",\"requestTime\":\"14/Aug/2021:06:37:49 +0000\", \"httpMethod\":\"POST\",\"resourcePath\":\"/V2/device-webservices//deviceapi/v2/reg\", \"stage\": \"V2\", \"protocol\":\"HTTP/1.1\", \"responseLength\":\"34\", \"headers\": -,  \"deviceId\": - }",
      "@timestamp": "2021-08-14T06:37:49.594Z",
      "cloudwatch_logs": {
        "log_stream": "0f0e473d493ee5aeaf18e0c77b0c4087",
        "log_group": "API-Gateway-AccessLog",
        "event_id": "36326198323024159588089220909144239265922769011572998144",
        "ingestion_time": "2021-08-14T06:38:09.671Z"
      },
      "type": "apiaccesslog"
    },
    "fields": {
      "@timestamp": [
        "2021-08-14T06:37:49.594Z"
      ],
      "cloudwatch_logs.ingestion_time": [
        "2021-08-14T06:38:09.671Z"
      ]
    },
    "sort": [
      1628923069594
    ]
  }

All of the field I need such as: IP, Method, responseStatus, ... are included in the message field. I want to parse them to separate field.

I tried to remove the filter in logstash.conf but the issue is still persit. Please help me to parse the message field to multiple fields.

input {
    cloudwatch_logs {
        log_group => "API-Gateway-AccessLog"
        region => "ap-south-1"
        type => "apiaccesslog"
        start_position => beginning
        codec => "json"
    }
}

output {
   stdout {
    codec => rubydebug { metadata => true }
}

   amazon_es {
    hosts => "https://search-test--xyz.us-east-1.es.amazonaws.com"
   index => "api-accesslog-"
  }
}

Please do not post pictures of text. They are hard to read, impossible to search or copy/paste, and some people will not be able to process them (text to speech will not work).

That said, your messages have _jsonparsefailure tags. The message is not valid JSON, so the json codec or filter is failing to parse it. For example the first example message you have contains

"xrayTraceId":""""

which is not valid, so a json filter complains

LogStash::Json::ParserError: Unexpected character ('"' (code 34)): was expecting comma to separate Object entries

because after it consumes the first two double quotes it does expect another double quote.

You second example contains

"headers": -,

which again is invalid and results in

LogStash::Json::ParserError: Unexpected character (',' (code 44)) in numeric value: expected digit (0-9) to follow minus sign, for valid numeric value

You will have to read the logstash logs to find all the ways in which your logs are not valid JSON. You may be able to fix them using mutate. For example

mutate {
    gsub => [
        "message", '"{4}', '"-"',
        "message", ": -([^\d])", ': "-"\1'
    ]
}

Obviously you will have to remove the codec and use a json filter after the mutate.

Hello @Badger

I tried with your filter, it's working now,

@timestamp
Aug 14, 2021 @ 22:57:14.053
	
@version
1
	
_id
1wdhRXsBtbGHaeQv7aXr
	
_index
api-accesslog-2021.08
	
_score
 - 
	
_type
_doc
	
cloudwatch_logs.event_id
36326946835472045116049940001020327307370886923731402753
	
cloudwatch_logs.ingestion_time
Aug 14, 2021 @ 22:57:24.305
	
cloudwatch_logs.log_group
API-Gateway-AccessLog
	
cloudwatch_logs.log_stream
aa5c48a2c60c9d7ff158c78aecefdc2e
	
deviceId
-
	
headers
-
	
httpMethod
POST
	
ip
42.106.4.198
	
message
{ "requestId":"ae7d772d-f048-4dc7-9e59-8900303c7d81", "ip": "42.106.4.198", "responseStatus":"200", "xrayTraceId":"-","requestTime":"14/Aug/2021:15:57:14 +0000", "httpMethod":"POST","resourcePath":"/V2/device-webservices/deviceapi/v2/activation", "stage": "V2", "protocol":"HTTP/1.1", "responseLength":"81", "headers": "-",  "deviceId": "-" }
	
protocol
HTTP/1.1
	
requestId
ae7d772d-f048-4dc7-9e59-8900303c7d81
	
requestTime
14/Aug/2021:15:57:14 +0000
	
resourcePath
/V2/device-webservices//deviceapi/v2/deviceapi/activation
	
responseLength
81
	
responseStatus
200
	
stage
V1
	
type
apiaccesslog
	
xrayTraceId
-

But to avoid the same issue in the future, can you please help to explain your filter, what does it mean, I'm newbie of ELK so not understand too much.

mutate {
    gsub => [
        "message", '"{4}', '"-"',
        "message", ": -([^\d])", ': "-"\1'
    ]
}

Anyway, thank you so much for your help!

mutate+gsub takes an array of triplets. Looking at

"message", '"{4}', '"-"'

that says that in the [message] field, each occurrence of the expression "{4} should be replaced with "-". Note that logstash configurations can surround strings with either single or double quotes. If the pattern contains double quotes it is easier to surround it with single quotes than to try to escape the double quotes with backslashes.

The regular expression "{4} means exactly four occurrences of the double quote character. That will convert "xrayTraceId":"""" into "xrayTraceId":"-"

The other one is slightly more complicated. ": -([^\d])" means the literal characters colon-space-hyphen. \d means a digit. [^\d] means a character that is not a digit. The parentheses surrounding that captures the non-digit so that it can be referred to later in the replacement string (using \1). So...

"message", ": -([^\d])", ': "-"\1'

will match "headers": -, and capture the comma as capture group 1. Overall "headers": -, will get modified into "headers": "-",. I use the character group and capture group because that will also match "deviceId": - .

Hello @Badger,

Thanks a lot for your help, just the last question for this topic, where I can find out these documents to learn more about some special character like:

 "{4}  :  four occurrences of the double quote character
\d:  : a digit
[^\d] : not a digit

Read the Ruby Regexp documentation, see the section on Repetition for the first item and Character Classes for the other two.

All clear, thank you so much for your help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.