How to handle special characters (hex encoded) in logstash mutate or grok

Johnson_will · February 2, 2024, 5:59pm

� This is the special character, I am on logstash7.17.10, It seems like it is able to parse on the greater version of logstash

I am using mutate to remove the special character from message

filter{
  # mutate { gsub => [ "message", "(\\u[0-9]{4})|(u[0-9]{4})|(\\b)", "" ] }
  mutate { gsub => [ "message", "\\u[0-9]{4}", "" ] }
  mutate { gsub => [ "message", "u[0-9]{4}", "" ] }
  mutate { gsub => [ "message", "\\b", "" ] }
  mutate { gsub => [ "message", '\"', "" ] }
  mutate { gsub => [ "message", "<", "" ] }
  mutate { gsub => [ "message", "&", "" ] }
  mutate { gsub => [ "message", "&", "" ] }
  mutate { gsub => [ "message", "�", "" ] }

Badger · February 2, 2024, 6:53pm

What is your question?

Johnson_will · February 2, 2024, 7:23pm

This is my original message

\u0000\u0000\u0000\u0017�\bS1T00S1T0_Snowflake_Ingestion\\S1T0_Snowflake_Ingestion_job_TACC_TYPE_BALANCEbS1T0_Snowflake_Ingestion_65a7c7b767404898408eb87b\u00004IDP successfully ran a job22024-01-17T12:27:35+00:00\u0002�\u0002/dev/02570/app/DQO0/data/jobprofile/tsz/S1T0_Snowflake_Ingestion/64ed110644ae0370f6a3778b/S1T0_Snowflake_Ingestion_job_TACC_TYPE_BALANCE/Final\u0000\u0000\u0002\u0001\u0000�\u0001https://api.idp-dev.devfg.rbc.com/jobs/history/by-id?id=65a7c7b767404898408eb87b

Here is my grok after the above mutated fields, I am getting a grok parse failure on logstash 7.17.10

S1T00%{GREEDYDATA:workspace_name}\\%{GREEDYDATA:job_name}IDP%{GREEDYDATA:job_status}2%{TIMESTAMP_ISO8601:Date_Time}/%{GREEDYDATA:job_path}https://%{GREEDYDATA:job_url}

Badger · February 2, 2024, 8:09pm

Your timestamp is not followed by /, so it doesn't match. Perhaps change it to

%{TIMESTAMP_ISO8601:Date_Time}%{DATA}/%{GREEDYDATA:job_path}

Johnson_will · February 9, 2024, 4:41pm

Thankyou for your reply,

I Finally found the reason for the special characters in the message, is because from the producer end, we have "avro" serializer, and this put some special characters in the message, as we are not using kafka deserializer in the logstash config.

Here is my config for avro, It didn't work

kafka {
               bootstrap_servers => "***********"
               topics => ["***********"]
               group_id => "***********"
               sasl_jaas_config => "com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true doNotPrompt=true storeKey=true refreshKrb5Config=true keyTab='***********' principal='***********' debug=true client=true;"
               kerberos_config => "krb5.conf"
               request_timeout_ms => 60000
               sasl_kerberos_service_name => "kafka"
               sasl_mechanism => "GSSAPI"
               security_protocol => "SASL_SSL"
              #  schema_registry_validation => "auto"
               schema_registry_url => "***********"
              #  value_deserializer_class => "io.confluent.kafka.serializers.KafkaAvroDeserializer"
              #  key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
              #  value_deserializer_class => "io.confluent.kafka.serializers.KafkaAvroDeserializer"
              #  codec => avro {
                    # schema_uri => "***********"
                    # schema_uri => "https://***********"
                    # schema_uri => "/Users/Downloads/schema.avsc"
                    # target => "[document]"
              # }
               ssl_truststore_location => "kafka-security.jks"
               ssl_truststore_password => "***********"
               ssl_truststore_type => "JKS"
        }

Johnson_will · February 15, 2024, 4:49am

In the organisation, the Kafka data is being sent in a Avro Schema and I am trying to load the data into Logstash using Kafka Input however without Avro Codec, I am getting the data in the below charset:

\u0001\u000E\u0010qhost1����\t\fREMOVE\u0002�����[\u0000"

I am assuming, once I add the Avro Codec, it should solve the problem. However, in the organisation network i am unable to install avro codec because we cannot connect to internet. Is there a way to package the avro-codec plugin such that I can send via FTP to the network machine?

Regards,

Badger · February 15, 2024, 4:59pm

Yes, you can create an offline plugin package.

system · March 14, 2024, 5:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Grok expression works, but causes logstash config check to fail. Special characters in log an issue? Logstash	2	537	December 8, 2018
Logstash with Kafka: Unable to decode avro Logstash	2	895	July 6, 2017
Logstash parshing unicode characters Logstash	17	3374	March 23, 2022
Unicode Characters Logstash	3	587	April 29, 2019
Special Characters in logs - how to escape them in logstash grok pattern Logstash	4	29190	July 6, 2017

How to handle special characters (hex encoded) in logstash mutate or grok

Related topics