Received an event that has a different character encoding than you configured - Logstash CPU

amaljoy · June 12, 2024, 5:19am

Hi Guys,
I received an unknown event from the syslog server.


Received an event that has a different character encoding than you configured. {:text=>"\\xC0\\u0014\\u00009\\u00008\\u0000\\x88\\u0000\\x87\\xC0\\u0019\\u0000:\\u0000\\x89\\xC0\\t\\xC0\\u0013\\u00003\\u00002\\u0000\\x9A\\u0000\\x99\\u0000E\\u0000D\\xC0\\u0018\\u00004\\u0000\\x9B\\u0000F\\xC0\\a\\xC0\\u0011\\xC0\\u0016\\u0000\\u0018\\xC0\\b\\xC0\\u0012\\u0000\\u0016\\u0000\\u0013\\xC0\\u0017\\u0000\\e\\u00005\\u0000\\x84\\u0000/\\u0000\\x96\\u0000A\\u0000\\a\\u0000\\u0005\\u0000\\u0004\\u0000\\n", :expected_charset=>"UTF-8"}

Simultaneously the logstash CPU became so high 99%. After getting an alert i restarted the logstash services and cpu was back to normal.

How to tackle these diffrent codecs?
Below is my input config

input {
 syslog {
  id => "idsyslog"
  host => "0.0.0.0"
  port => 10514
  type => "syslog"
 }
}

filter {
 if "syslog" in [type] {
  if [message] =~ "Palo Alto"  {
   mutate {
    id => "PAid01"
    add_tag => ["PA"]
   }
  } else if [log][syslog][facility][name] == "security/authorization" {
   mutate {
    id => "Linuxid01"
    add_tag => ["Linux"]
   }
  } else if [process][name] == "SymantecServer" {
   mutate {
    id => "SEPMid01"
    add_tag => ["SEPM"]
   }
  } else if [message] =~ "VPXEXT" {
   mutate {
    id => "CITRIXext01"
    add_tag => ["CITRIX"]
   }
  } else if [message] =~ /VPXINT\d+/ {
   mutate {
    id => "CITRIXint01"
    add_tag => ["CITRIX"]
   }
  } else if [message] =~ "Cyber-Ark" {
   mutate {
    id => "CyberArkid01"
    add_tag => "CyberArk"
   }
  } else if "TRAPpp" in [type] {
   mutate {
    id => "toPPTRAP01"
    add_tag => ["toPPTRAP"]
   }
  } else if [message] =~ "AgentDevice" {
   mutate {
    id => "MSid01"
    add_tag => ["MS"]
   }
  } else if "_grokparsefailure" in [tags] {
   mutate {
    add_tag => ["toTest"]
    remove_tag => ["_grokparsefailure"]
   }
  } else if "_grokparsefailure_sysloginput" in [tags] {
   mutate {
    add_tag => ["FAILED"]
   }
  }
 }
  mutate {
    add_field => { "log_node_name" => "log01" }
  }
}

output {
 if "PA" in [tags] {
  pipeline {
   id => "toPA01"
   send_to => toPA
  }
 } else if "SEPM" in [tags] {
  pipeline {
   id => "toSEPM01"
   send_to => toSEPM
  }
 } else if "TRAPpp" in [type] {
  pipeline {
   id => "toTrap01"
   send_to => toTest
   #codec => rubydebug
  }
 } else if "MS" in [tags] {
  pipeline {
   id => "toMS01"
   send_to => toMS
  }
 } else if "CITRIX" in [tags] {
  pipeline {
   id => "toCitrix01"
   send_to => toCitrix
  }
 } else if "CyberArk" in [tags] {
  pipeline {
   id => "toCyberArk01"
   send_to => toCyberArk
  }
 } else if "Linux" in [tags] {
  pipeline {
   id => "toLinux01"
   send_to => toLinux
  }
 } else {
  elasticsearch {
   id => "MaintoNewES"
   ssl => true
   ssl_certificate_verification => true
   cacert => "/etc/pki/tls/certs/ca-bundle.crt"
   hosts => ["**************"]
   data_stream => "true"
   action => "create"
   user => "${ES_USER}"
   password => "${ES_PWD}"
  }
 }
}

amaljoy · June 12, 2024, 5:20am

logstash log

leandrojmp · June 12, 2024, 12:13pm

Only on the source, you need to check on your syslog what is generating these events with a different encoding.

You cannot solve this in Logstash.

amaljoy · June 13, 2024, 4:17am

Hey Leandro, Thanks for replying.

is it possible to use
codec => plain { charset => "ISO-8859-1" }

and will it work?

leandrojmp · June 13, 2024, 11:59am

Are all the data being sent to this input using the ISO-8859-1 codec? This setting applies to all events received by the input, so if you change you will probably start getting errors about data arriving in UTF-8 but the codec expected being ISO-8859-1.

It seems that the majority of your data is correctly using UTF-8, but some will arrive with a different codec, you can have only one codec per input.

As mentioned, this needs to be solved in the source, you need to check your syslog server/device and configure it to use UTF-8.

amaljoy · June 13, 2024, 1:03pm

Thanks for the reply

Majority will be "UTF-8".
I can't change anything in the source. There are around 50+ servers from which we receive data.(Many teams are there, not in my control)

The consequence of this issue, is the high CPU utilization happening due to this.
Can i add anything in the filters/grok to handle this un encoded event?
(U can have a look at the input config i am having in my post).

leandrojmp · June 13, 2024, 1:10pm

Not sure, but maybe checking if the message has the \\u character, which would indicate the wrong codec.

I think it would be something like this:

If "\\u"  in [message] {
    drop {}
}

But you would need to test it to find the exact condition that you need to use.

amaljoy · June 14, 2024, 12:26pm

Thanks @leandrojmp ,

Another doubt I have is, I am ingesting logstash logs using filebeat into elasticsearch. In the logstash logs i am receiving the ip address of my client ips as shown below.
logstash-plain.log

Is there any way I can capture these client connections(ips) as a new field in my original data logs, Through logstash config file add_field?
Is there any meta data for incoming ips?