Logstash parshing unicode characters

vee · February 10, 2022, 7:18pm

hi, we are running into parsing errors when sending data through logstash pipelines. After digging a little, found that the source system has been sending a lot of unicode special characters - something like below:

\u0006\u0000\u0000\u0000\u0000\u0000\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\b��\u0006\u0000\u0000\u0000\u0000\u0000\u0000\u0000\r\u0006\u0000\u0000\u0000\u0000\u0000\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\a��\u0006\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0013\u0006\u0000\u0000\u0000\u0000\u0000\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0006��\u0006\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0019\u0006\u0000\u0000\u0000\u0000\u0000\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0005��\u0006\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u001F\u0006\u0000\u0000\u0000\u0000\u0000\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0004��\u0006\u0000\u0000\u0000\u0000\u0000\u0000\u0000%\u0006\u0000\u0000\u0000\u0000\u0000\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0003��\u0006\

Is there a logstash grok plugin available that can remove those special unicode characters?

Thanks!

vee · February 14, 2022, 4:10pm

I could really use some guidance with the above issue. Any suggestions please?

Tomo_M · February 15, 2022, 4:19pm

Have you tried mutate gsub filter:

    filter {
      mutate {
        gsub => [
          "fieldname", "[\u0000\u0001]", "",
        ]
      }
    }

?

vee · February 15, 2022, 4:23pm

Thanks Tomo. Yes, I tried using gsub filter quite often in the past. But in this particular case, these unicode strings could vary with each different character and so I was looking for something generic to use. The fieldnames could be \u0000 or any other character/number after "\u". It will be highly laborious to go over each unicode character and use the gsub filter (also guessing it's not effective).

Tomo_M · February 15, 2022, 4:26pm

Are there so many special characters?
How to distinguish from useful characters.

vee · February 15, 2022, 4:28pm

That's where I was looking to get some help

I was hoping there is some readily available filter to tackle those unicode characters.

Tomo_M · February 15, 2022, 4:38pm

How about:

    filter {
      mutate {
        gsub => [
          "fieldname", "[\u0000-\u001F]", "",
        ]
      }
    }

?

There was no ignore option on plain codec plugin. So it seems you have to specify the characters by regex. I suppose it is not so many special characters generated from your system.

vee · February 15, 2022, 5:40pm

No luck yet. Still errors in parsing the message:

 at [Source: (byte[])"{"id":"79f9856b-ba63-41fd-9665-b8484086cb5c","message":"Claimed ....

leandrojmp · February 15, 2022, 5:54pm

Can you share the full error you are receiving?

Is that unicode string part of your message or it is your entire message this way?

Is every message like this or just some random message will give you this error? Can you share an example of a message that is giving you the error?

What is your logstash pipeline?

vee · February 15, 2022, 7:23pm

Here is the sample test message (removed sensitive info):

{"app":"test-api[v1]","hostname":"test-api-blue04-dc1-zn1","port":10610,"@timestamp":"2022-02-10T10:56:29.205-06:00","logLevel":"INFO","threadName":"Default Executor-thread-2453","loggerName":"com.rest.util.RestLoggingHelper","message":"restType=RESPONSE|requestUri=/test-api/credit-card/aclaims|responseCode=200|responseStatus=OK|requestBody=[{\"channel\":\"BBB\",\"criteria\":[{\"token\":\"XXYXYXYXYXY\",\"referenceDate\":\"2022-01-06\",\"amount\":1031.22,\"authorizationProperties\":{\"customerNumber\":\"1515151515\",\"orderNumber\":\"151515151\"}}]}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000]|responseBody=[{\"id\":\"79f9856b-ba63-41fd-9665-b8484086cb5c\",\"message\":\"Claimed 1 authorization out of 1 unclaimed matches, 0 matches were already claimed.\",\"claimedAuthorization\":{\"id\":\"79f9856b-2233-41fd-9665-b8484086cb5c\",\"token\":\"222233334444\",\"expirationMMYY\":\"0000\",\"referenceDate\":\"2022-01-06\",\"amount\":1031.22,\"currencyCode\":\"USD\",\"nameOnCard\":\"test user\",\"telephoneNumber\":\"21212121212\"}

Note that a bunch of spaces right infront of \u0000\u0000, which is being trimmed here once I paste the msg.

Logstash config (filter section):

filter {

        json{
                source => "message"
        }

        if "beats_input_codec_plain_applied" in [tags] {
        mutate {
            remove_tag => "beats_input_codec_plain_applied"
        }
    }

      mutate {
        gsub => [
          "message", "[\\]u0000", ""
        ]
      }
..
<rest of the log processing here >

Error message:


[2022-02-15T13:18:48,537][WARN ][logstash.filters.json    ][pipeline] Error parsing json {:source=>"message", :raw=>"{\"id\":\"79f9856bxyxyxyx-b8484086cb5c\",\"message\":\"Claimed 1 authorization out of 1 unclaimed matches, 0 matches were already claimed.\",\"claimedAuthorization\":{\"id\":\"79f9856bxyxyxyx41fd-9665-b8484086cb5c\",\"token\":\"xyxyxyx\",\"expirationMMYY\":\"0000\",\"referenceDate\":\"2022-01-06\",\"amount\":1031.22,\"currencyCode\":\"USD\",\"nameOnCard\":\"test user\",\"telephoneNumber\":\"12121212121\",\"address1\":\"PO BOX 121212121\",\"city\":\"test\",\"stateOrProvinceCode\":\"CA\",\"postalCode\":\"123121\",\"countryCode\":\"US\",\"channel\":\"WEB\",\"authorizationProperties\":{\"orderNumber\":\"12112121\",\"webCartId\":\"\",\"webSessionId\":\"cdfaf57a-121212121a2bf-2d576c4e9980\",\"webIP\":\"121.121.12.12\",\"invoiceNumber\":\"121212121\",\"customerNumber\":\"1212121\"},\"verified\":true,\"authorized\":true,\"authorizationResponseCode\":\"100\",\"authorizationResponseDescription\":\"APPROVED\",\"authorizationCode\":\"121212\",\"addressVerificationResponseCode\":\"I4\",\"level3\":\"N\",\"commercialCreditCard\":\"Y\",\"overrideModeOfPayment\":\"MC\",\"transactionId\":null,\"cvsResponse\":null,\"cvvMatchResponseCode\":null,\"createdTimestamp\":\"2022-02-10T10:55:53.115-06:00[America/Chicago\\\",\"claimed\":true,\"claimedTimestamp\":\"2022-02-10T10:56:29.198-06:00[America/Chicago\\\",\"claimedRequestId\":\"12121212121-1212121-4c5f-94d\",\"merchantOrderNumber\":\"1212121\"}}]", :exception=>#<LogStash::Json::ParserError: Unexpected character ('c' (code 99)): was expecting comma to separate Object entries
 at [Source: (byte[])"{"id":"79f9856b-ba63-41fd-9665-b8484086cb5c","message":"Claimed 1 authorization out of 1 unclaimed matches, 0 matches were already claimed.","claimedAuthorization":{"id":"79f9856b-ba63-12121-9665-121212121","token":"1212121211","expirationMMYY":"0000","referenceDate":"2022-01-06","amount":1031.22,"currencyCode":"USD","nameOnCard":"test user","telephoneNumber":"121212121","address1":"PO BOX 121212121","city":"test","stateOrProvinceCode":"CA","postalCode":"12121","countryCode":"US","[truncated 780 bytes]; line: 1, column: 1107]>}

Tomo_M · February 15, 2022, 7:48pm

try gsub before json filter..

Badger · February 15, 2022, 7:51pm

"createdTimestamp":"2022-02-10T10:55:53.115-06:00[America/Chicago\","claimed":true,

You have a backslash at the end of createdTimestamp. That escapes the quote, so the value of createdTimestamp is 2022-02-10T10:55:53.115-06:00[America/Chicago\", and the parser blows up when it tries to interpret claimed. That is what causes

Unexpected character ('c' (code 99)): was expecting comma to separate Object entries

vee · February 16, 2022, 9:10pm

Tried to remove the backslash and using gsub before json - still no luck!

When I tested with this message - I see it's working, which has backslash at the end of createdTimestamp. But when I used the original values, it's blowing up.

{"app":"test-api[v1]","hostname":"test-api-blue04","port":10610,"@timestamp":"2022-02-16T13:56:29.205-06:00","logLevel":"INFO","threadName":"Default Executor-thread-2453","loggerName":"com.ha.rest.RestLoggingHelper","message":"restType=RESPONSE|requestUri=/test-api/cc/auth|responseCode=200|responseStatus=OK|requestBody=[{\"channel\":\"CCC\",\"criteria\":[{\"token\":\"XXXXXXXXXX\",\"referenceDate\":\"2022-01-06\",\"amount\":0000.22,\"authorizationProperties\":{\"customerNumber\":\"1111111\",\"orderNumber\":\"211212121\"}}]}        \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000]|responseBody=[{\"id\":\"121212121-12121-12121-b8484086cb5c\",\"message\":\"Claimed 1 authorization out of 1 unclaimed matches, 0 matches were already claimed.\",\"claimedAuthorization\":{\"id\":\"121212121-12121-12121-b8484086cb5c\",\"token\":\"121212121121212\",\"expirationMMYY\":\"0000\",\"referenceDate\":\"2022-01-06\",\"amount\":0000.22,\"currencyCode\":\"USD\",\"nameOnCard\":\"test t. user\",\"telephoneNumber\":\"121212121212\",\"address1\":\"PO BOX 1110000\",\"city\":\"test\",\"stateOrProvinceCode\":\"CA\",\"postalCode\":\"00000\",\"countryCode\":\"US\",\"channel\":\"CCC\",\"authorizationProperties\":{\"orderNumber\":\"1212121212\",\"webCartId\":\"\",\"webSessionId\":\"cdfaf57a-853b-431e-a2bf-12121212121\",\"webIP\":\"000.000.10.00\",\"invoiceNumber\":\"12121212\",\"customerNumber\":\"12121212\"},\"verified\":true,\"authorized\":true,\"authorizationResponseCode\":\"100\",\"authorizationResponseDescription\":\"APPROVED\",\"authorizationCode\":\"1212121\",\"addressVerificationResponseCode\":\"I4\",\"level3\":\"N\",\"commercialCreditCard\":\"Y\",\"overrideModeOfPayment\":\"MC\",\"transactionId\":null,\"cvsResponse\":null,\"cvvMatchResponseCode\":null,\"createdTimestamp\":\"2022-02-13T10:55:53.115-06:00[America/Chicago]\",\"claimed\":true,\"claimedTimestamp\":\"2022-02-13T10:56:29.198-06:00[America/Chicago]\",\"claimedRequestId\":\"121212121-d8ae-4c5f-94d9-986393bd1406\",\"merchantOrderNumber\":\"12121212121\"}}]","originatorApp":"test-api","correlation-id":"121212121212121","originatorContext":"/credit-card/authorization-claims","userName":"test","request-id":"9Q8V1n5WRmmt-Xpj4EmlXw","userId":""}

This tells the parser is skipping the unicode characters, and so must be failing at some other field. Still scratching my head which field is it - just wish logstash shows where exactly is it erroring.

When I send the original message, it appears to be erroring right here but can't figure out why.

exception=>#<LogStash::Json::ParserError: Unexpected character ('c' (code 99)): was expecting comma to separate Object entries
 at [Source: (byte[])"{"id":"79f9856b-ba63-41fd-9665-b8484086cb5c","message":"Claimed 1 authorization out of 1 unclaimed matches, 0 matches were already claimed.","claimedAuthorization":{"id":"79f9856b-ba63-41fd-9665-b8484086cb5c","token":"111111QXZEFC1111","expirationMMYY":"0000","referenceDate":"2022-00-00","amount":1031.22,"currencyCode":"USD","nameOnCard":"TEST T TEST","telephoneNumber":"0000000000","address1":"PO BOX 0000","city":"TEST","stateOrProvinceCode":"CA","postalCode":"00000","countryCode":"US","[truncated 780 bytes]; line: 1, column: 1107]>}

vee · February 23, 2022, 7:13pm

Hoping someone would get a chance to look at my last update and see if you have any other suggestions!

Badger · February 23, 2022, 7:29pm

You have not shown us the message that it is failing on. I cannot speculate on what the error might be. Ideally, show the field you are trying to parse from the output of

output { stdout { codec => rubydebug } }

vee · February 23, 2022, 8:22pm

Sorry, I had to scrape all the confidential information before posting here. But this is what I see in the logstash log once I enabled the stdout output section:

[2022-02-23T14:05:48,918][INFO ][logstash.javapipeline    ][filebeat-logstash1] Pipeline started {"pipeline.id"=>"filebeat-logstash1"}
[2022-02-23T14:05:49,056][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:"filebeat-logstash1"], :non_running_pipelines=>[]}
[2022-02-23T14:05:49,059][INFO ][filewatch.observingtail  ][filebeat-logstash1] START, creating Discoverer, Watch with file and sincedb collections
[2022-02-23T14:05:49,997][WARN ][logstash.filters.json    ][filebeat-logstash1] Error parsing json {:source=>"message", :raw=>"{\"id\":\"1f9f9f9f9-ba63-41fd-9665-b8484086cb5c\",\"message\":\"Claimed 1 authorization out of 1 unclaimed matches, 0 matches were already claimed.\",\"claimedAuthorization\":{\"id\":\"1f9f9f9f9-ba63-41fd-9665-b8484086cb5c\",\"token\":\"5111114QXZEFC1222\",\"expirationMMYY\":\"1111\",\"referenceDate\":\"2022-01-06\",\"amount\":1031.22,\"currencyCode\":\"USD\",\"nameOnCard\":\"User T Test\",\"telephoneNumber\":\"1111111111\",\"address1\":\"PO BOX 1111\",\"city\":\"CITY\",\"stateOrProvinceCode\":\"CA\",\"postalCode\":\"11111\",\"countryCode\":\"US\",\"channel\":\"WEB\",\"authorizationProperties\":{\"orderNumber\":\"64556572\",\"webCartId\":\"\",\"webSessionId\":\"cdfaf57a-853b-431e-a2bf-2d576c4e9980\",\"webIP\":\"111.111.11.11\",\"invoiceNumber\":\"111111111\",\"customerNumber\":\"1111111\"},\"verified\":true,\"authorized\":true,\"authorizationResponseCode\":\"100\",\"authorizationResponseDescription\":\"APPROVED\",\"authorizationCode\":\"038394\",\"addressVerificationResponseCode\":\"I4\",\"level3\":\"N\",\"commercialCreditCard\":\"Y\",\"overrideModeOfPayment\":\"MC\",\"transactionId\":null,\"cvsResponse\":null,\"cvvMatchResponseCode\":null,\"createdTimestamp\":\"2022-02-10T10:55:53.115-06:00[America/Chicago\\\",\"claimed\":true,\"claimedTimestamp\":\"2022-02-10T10:56:29.198-06:00[America/Chicago\\\",\"claimedRequestId\":\"33a98688-d8ae-4c5f-94d9-986393bd1406\",\"merchantOrderNumber\":\"64556572\"}}]", :exception=>#<LogStash::Json::ParserError: Unexpected character ('c' (code 99)): was expecting comma to separate Object entries
 at [Source: (byte[])"{"id":"1f9f9f9f9-ba63-41fd-9665-b8484086cb5c","message":"Claimed 1 authorization out of 1 unclaimed matches, 0 matches were already claimed.","claimedAuthorization":{"id":"1f9f9f9f9-ba63-41fd-9665-b8484086cb5c","token":"5111114QXZEFC1222","expirationMMYY":"1111","referenceDate":"2022-01-06","amount":1031.22,"currencyCode":"USD","nameOnCard":"User T Test","telephoneNumber":"1111111111","address1":"PO BOX 1111","city":"CITY","stateOrProvinceCode":"CA","postalCode":"11111","countryCode":"US","[truncated 780 bytes]; line: 1, column: 1107]>}
{
             "restType" => "RESPONSE",
             "userName" => "APIUSER",
         "responseCode" => 200,
           "loggerName" => "com.comp.ha.rest.util.RestLoggingHelper",
                 "tags" => [
        [0] "file-based",
        [1] "_jsonparsefailure"
    ],
             "@version" => "1",
          "logFilePath" => "%{[log][file][path]}",
             "logLevel" => "info",
                 "port" => 10610,
       "correlation-id" => "A4844003AB991A7AA9AB0004AC1C0C7C",
                 "zone" => "backoffice",
           "@timestamp" => 2022-02-23T19:56:29.205Z,
              "message" => "RESPONSE BODY : %{parsedccclaimrespjson}",
             "hostname" => "%{[host][name]}",
           "request-id" => "9Q8V1n5WRmmt-Xpj4EmlXw",
        "originatorApp" => "test-api",
               "userId" => "",
                  "jvm" => "test-api-blue04-dc1-zn1",
           "threadName" => "Default Executor-thread-2453",
                 "path" => "/pathfilebeat/config/test.log",
                "appApp" => "test-api[v1]",
       "responseStatus" => "OK",
                 "type" => "log",
    "originatorContext" => "/credit-card/authorization-claims"
}
[2022-02-23T14:05:51,576][DEBUG][logstash.outputs.elasticsearch][filebeat-logstash1] Sending final bulk request for batch. {:action_count=>1, :payload_size=>835, :content_length=>835, :batch_offset=>0}

When I sent the working message with all test data in it, here is the output:

[2022-02-23T14:18:34,182][WARN ][logstash.filters.json    ][logstash-filebeat1] Parsed JSON object/hash requires a target configuration option {:source=>"message", :raw=>""}
{
       "@version" => "1",
       "hostname" => "%{[host][name]}",
           "type" => "log",
           "tags" => [
        [0] "_jsonparsefailure",
        [1] "file-based"
    ],
        "message" => "",
           "zone" => "backoffice",
    "logFilePath" => "%{[log][file][path]}",
           "path" => "path/logstashpipline/config/test2.log",
     "@timestamp" => 2022-02-23T20:18:34.038Z
}
{
        "originatorApp" => "test-api",
    "originatorContext" => "/credit-card/authorization-claims",
             "userName" => "test",
           "loggerName" => "com.ha.rest.RestLoggingHelper",
             "logLevel" => "info",
             "@version" => "1",
               "userId" => "",
             "hostname" => "%{[host][name]}",
                 "type" => "log",
              "message" => "restType=RESPONSE|requestUri=/test-api/cc/auth|responseCode=200|responseStatus=OK|requestBody=[{\"channel\":\"CCC\",\"criteria\":[{\"token\":\"XXXXXXXXXX\",\"referenceDate\":\"2022-01-06\",\"amount\":0000.22,\"authorizationProperties\":{\"customerNumber\":\"1111111\",\"orderNumber\":\"211212121\"}}]}        \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000]|responseBody=[{\"id\":\"121212121-12121-12121-b8484086cb5c\",\"message\":\"Claimed 1 authorization out of 1 unclaimed matches, 0 matches were already claimed.\",\"claimedAuthorization\":{\"id\":\"121212121-12121-12121-b8484086cb5c\",\"token\":\"121212121121212\",\"expirationMMYY\":\"0000\",\"referenceDate\":\"2022-01-06\",\"amount\":0000.22,\"currencyCode\":\"USD\",\"nameOnCard\":\"test t. user\",\"telephoneNumber\":\"121212121212\",\"address1\":\"PO BOX 1110000\",\"city\":\"test\",\"stateOrProvinceCode\":\"CA\",\"postalCode\":\"00000\",\"countryCode\":\"US\",\"channel\":\"CCC\",\"authorizationProperties\":{\"orderNumber\":\"1212121212\",\"webCartId\":\"\",\"webSessionId\":\"cdfaf57a-853b-431e-a2bf-12121212121\",\"webIP\":\"000.000.10.00\",\"invoiceNumber\":\"12121212\",\"customerNumber\":\"12121212\"},\"verified\":true,\"authorized\":true,\"authorizationResponseCode\":\"100\",\"authorizationResponseDescription\":\"APPROVED\",\"authorizationCode\":\"1212121\",\"addressVerificationResponseCode\":\"I4\",\"level3\":\"N\",\"commercialCreditCard\":\"Y\",\"overrideModeOfPayment\":\"MC\",\"transactionId\":null,\"cvsResponse\":null,\"cvvMatchResponseCode\":null,\"createdTimestamp\":\"2022-02-13T10:55:53.115-06:00[America/Chicago]\",\"claimed\":true,\"claimedTimestamp\":\"2022-02-13T10:56:29.198-06:00[America/Chicago]\",\"claimedRequestId\":\"121212121-d8ae-4c5f-94d9-986393bd1406\",\"merchantOrderNumber\":\"12121212121\"}}]",
          "logFilePath" => "%{[log][file][path]}",
           "threadName" => "Default Executor-thread-2453",
                 "port" => 10610,
                  "jvm" => "test-api-blue04",
                  "app" => "test-api[v1]",
                 "tags" => [
        [0] "file-based"
    ],
                 "zone" => "backoffice",
           "request-id" => "9Q8V1n5WRmmt-Xpj4EmlXw",
                 "path" => "path/logstashpipline/config/test2.log",
           "@timestamp" => 2022-02-23T19:56:29.205Z,
       "correlation-id" => "121212121212121"
}

Badger · February 23, 2022, 8:46pm

The event contents after the message has been parsed does not help me, and the error message truncates the JSON. Can you post the log entry that logstash is ingesting?

system · March 23, 2022, 8:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash Parsing Error Elasticsearch	2	627	July 6, 2017
Logstash CSV Filter Using Unicode delimiter ( SOH ) Logstash	5	4430	July 6, 2017
Logstash throwing error .Unexpected character (‘<’ (code 60)) Logstash	12	3531	September 18, 2019
Additional // are seen in parsed message after logstash parsed syslog event Logstash	3	436	July 6, 2017
Logstash failing to parse json Logstash	21	2410	July 6, 2017

Logstash parshing unicode characters

Related topics