How to handle special characters (hex encoded) in logstash mutate or grok

� This is the special character, I am on logstash7.17.10, It seems like it is able to parse on the greater version of logstash

I am using mutate to remove the special character from message

filter{
  # mutate { gsub => [ "message", "(\\u[0-9]{4})|(u[0-9]{4})|(\\b)", "" ] }
  mutate { gsub => [ "message", "\\u[0-9]{4}", "" ] }
  mutate { gsub => [ "message", "u[0-9]{4}", "" ] }
  mutate { gsub => [ "message", "\\b", "" ] }
  mutate { gsub => [ "message", '\"', "" ] }
  mutate { gsub => [ "message", "<", "" ] }
  mutate { gsub => [ "message", "&", "" ] }
  mutate { gsub => [ "message", "&", "" ] }
  mutate { gsub => [ "message", "�", "" ] }
  

What is your question?

1 Like

This is my original message

\u0000\u0000\u0000\u0017�\bS1T00S1T0_Snowflake_Ingestion\\S1T0_Snowflake_Ingestion_job_TACC_TYPE_BALANCEbS1T0_Snowflake_Ingestion_65a7c7b767404898408eb87b\u00004IDP successfully ran a job22024-01-17T12:27:35+00:00\u0002�\u0002/dev/02570/app/DQO0/data/jobprofile/tsz/S1T0_Snowflake_Ingestion/64ed110644ae0370f6a3778b/S1T0_Snowflake_Ingestion_job_TACC_TYPE_BALANCE/Final\u0000\u0000\u0002\u0001\u0000�\u0001https://api.idp-dev.devfg.rbc.com/jobs/history/by-id?id=65a7c7b767404898408eb87b

Here is my grok after the above mutated fields, I am getting a grok parse failure on logstash 7.17.10

S1T00%{GREEDYDATA:workspace_name}\\%{GREEDYDATA:job_name}IDP%{GREEDYDATA:job_status}2%{TIMESTAMP_ISO8601:Date_Time}/%{GREEDYDATA:job_path}https://%{GREEDYDATA:job_url}

Your timestamp is not followed by /, so it doesn't match. Perhaps change it to

%{TIMESTAMP_ISO8601:Date_Time}%{DATA}/%{GREEDYDATA:job_path}

1 Like

Thankyou for your reply,

I Finally found the reason for the special characters in the message, is because from the producer end, we have "avro" serializer, and this put some special characters in the message, as we are not using kafka deserializer in the logstash config.

Here is my config for avro, It didn't work

kafka {
               bootstrap_servers => "***********"
               topics => ["***********"]
               group_id => "***********"
               sasl_jaas_config => "com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true doNotPrompt=true storeKey=true refreshKrb5Config=true keyTab='***********' principal='***********' debug=true client=true;"
               kerberos_config => "krb5.conf"
               request_timeout_ms => 60000
               sasl_kerberos_service_name => "kafka"
               sasl_mechanism => "GSSAPI"
               security_protocol => "SASL_SSL"
              #  schema_registry_validation => "auto"
               schema_registry_url => "***********"
              #  value_deserializer_class => "io.confluent.kafka.serializers.KafkaAvroDeserializer"
              #  key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
              #  value_deserializer_class => "io.confluent.kafka.serializers.KafkaAvroDeserializer"
              #  codec => avro {
                    # schema_uri => "***********"
                    # schema_uri => "https://***********"
                    # schema_uri => "/Users/Downloads/schema.avsc"
                    # target => "[document]"
              # }
               ssl_truststore_location => "kafka-security.jks"
               ssl_truststore_password => "***********"
               ssl_truststore_type => "JKS"
        }

In the organisation, the Kafka data is being sent in a Avro Schema and I am trying to load the data into Logstash using Kafka Input however without Avro Codec, I am getting the data in the below charset:

\u0001\u000E\u0010qhost1����\t\fREMOVE\u0002�����[\u0000"

I am assuming, once I add the Avro Codec, it should solve the problem. However, in the organisation network i am unable to install avro codec because we cannot connect to internet. Is there a way to package the avro-codec plugin such that I can send via FTP to the network machine?

Regards,

Yes, you can create an offline plugin package.