Received an event that has a different character encoding than you configured - Logstash CPU

Hi Guys,
I received an unknown event from the syslog server.


Received an event that has a different character encoding than you configured. {:text=>"\\xC0\\u0014\\u00009\\u00008\\u0000\\x88\\u0000\\x87\\xC0\\u0019\\u0000:\\u0000\\x89\\xC0\\t\\xC0\\u0013\\u00003\\u00002\\u0000\\x9A\\u0000\\x99\\u0000E\\u0000D\\xC0\\u0018\\u00004\\u0000\\x9B\\u0000F\\xC0\\a\\xC0\\u0011\\xC0\\u0016\\u0000\\u0018\\xC0\\b\\xC0\\u0012\\u0000\\u0016\\u0000\\u0013\\xC0\\u0017\\u0000\\e\\u00005\\u0000\\x84\\u0000/\\u0000\\x96\\u0000A\\u0000\\a\\u0000\\u0005\\u0000\\u0004\\u0000\\n", :expected_charset=>"UTF-8"}

Simultaneously the logstash CPU became so high 99%. After getting an alert i restarted the logstash services and cpu was back to normal.


How to tackle these diffrent codecs?
Below is my input config

input {
 syslog {
  id => "idsyslog"
  host => "0.0.0.0"
  port => 10514
  type => "syslog"
 }
}

filter {
 if "syslog" in [type] {
  if [message] =~ "Palo Alto"  {
   mutate {
    id => "PAid01"
    add_tag => ["PA"]
   }
  } else if [log][syslog][facility][name] == "security/authorization" {
   mutate {
    id => "Linuxid01"
    add_tag => ["Linux"]
   }
  } else if [process][name] == "SymantecServer" {
   mutate {
    id => "SEPMid01"
    add_tag => ["SEPM"]
   }
  } else if [message] =~ "VPXEXT" {
   mutate {
    id => "CITRIXext01"
    add_tag => ["CITRIX"]
   }
  } else if [message] =~ /VPXINT\d+/ {
   mutate {
    id => "CITRIXint01"
    add_tag => ["CITRIX"]
   }
  } else if [message] =~ "Cyber-Ark" {
   mutate {
    id => "CyberArkid01"
    add_tag => "CyberArk"
   }
  } else if "TRAPpp" in [type] {
   mutate {
    id => "toPPTRAP01"
    add_tag => ["toPPTRAP"]
   }
  } else if [message] =~ "AgentDevice" {
   mutate {
    id => "MSid01"
    add_tag => ["MS"]
   }
  } else if "_grokparsefailure" in [tags] {
   mutate {
    add_tag => ["toTest"]
    remove_tag => ["_grokparsefailure"]
   }
  } else if "_grokparsefailure_sysloginput" in [tags] {
   mutate {
    add_tag => ["FAILED"]
   }
  }
 }
  mutate {
    add_field => { "log_node_name" => "log01" }
  }
}

output {
 if "PA" in [tags] {
  pipeline {
   id => "toPA01"
   send_to => toPA
  }
 } else if "SEPM" in [tags] {
  pipeline {
   id => "toSEPM01"
   send_to => toSEPM
  }
 } else if "TRAPpp" in [type] {
  pipeline {
   id => "toTrap01"
   send_to => toTest
   #codec => rubydebug
  }
 } else if "MS" in [tags] {
  pipeline {
   id => "toMS01"
   send_to => toMS
  }
 } else if "CITRIX" in [tags] {
  pipeline {
   id => "toCitrix01"
   send_to => toCitrix
  }
 } else if "CyberArk" in [tags] {
  pipeline {
   id => "toCyberArk01"
   send_to => toCyberArk
  }
 } else if "Linux" in [tags] {
  pipeline {
   id => "toLinux01"
   send_to => toLinux
  }
 } else {
  elasticsearch {
   id => "MaintoNewES"
   ssl => true
   ssl_certificate_verification => true
   cacert => "/etc/pki/tls/certs/ca-bundle.crt"
   hosts => ["**************"]
   data_stream => "true"
   action => "create"
   user => "${ES_USER}"
   password => "${ES_PWD}"
  }
 }
}


logstash log

Only on the source, you need to check on your syslog what is generating these events with a different encoding.

You cannot solve this in Logstash.

Hey Leandro, Thanks for replying.

is it possible to use
codec => plain { charset => "ISO-8859-1" }

and will it work?

Are all the data being sent to this input using the ISO-8859-1 codec? This setting applies to all events received by the input, so if you change you will probably start getting errors about data arriving in UTF-8 but the codec expected being ISO-8859-1.

It seems that the majority of your data is correctly using UTF-8, but some will arrive with a different codec, you can have only one codec per input.

As mentioned, this needs to be solved in the source, you need to check your syslog server/device and configure it to use UTF-8.

Thanks for the reply :grinning:

Majority will be "UTF-8".
I can't change anything in the source. There are around 50+ servers from which we receive data.(Many teams are there, not in my control) :slightly_frowning_face:

The consequence of this issue, is the high CPU utilization happening due to this.
Can i add anything in the filters/grok to handle this un encoded event?
(U can have a look at the input config i am having in my post).

Not sure, but maybe checking if the message has the \\u character, which would indicate the wrong codec.

I think it would be something like this:

If "\\u"  in [message] {
    drop {}
}

But you would need to test it to find the exact condition that you need to use.

Thanks @leandrojmp ,

Another doubt I have is, I am ingesting logstash logs using filebeat into elasticsearch. In the logstash logs i am receiving the ip address of my client ips as shown below.
logstash-plain.log

Is there any way I can capture these client connections(ips) as a new field in my original data logs, Through logstash config file add_field?
Is there any meta data for incoming ips?

Hi @leandrojmp ,

There is a vulnerability scan and penetration testing done from tenable.io / nessus server in our logstash server.

This is my input plugin config in logstash.

input {
 syslog {
  id => "idsyslog"
  host => "0.0.0.0"
  port => 10514
  type => "syslog"
  codec => plain {
                    charset => "ISO-8859-1"
            }
 }
}

While the vulnerability scan happens on port 10514 its being read as logs by our logstash. Below are the messages we receive as captured by logstash.

Now this is totally fine, but the only issue we are having is the CPU utilization becomes high in the server while vulnerability scan starts and persisits.

Is there any way in which this issue can be tackled? will adding grok_pattern help here?
grok_pattern

Finally this issue has been resolved. :heart_eyes:

Changes i have made:

  1. listening on the logstash ip rather than 0.0.0.0.
  2. made changes in pipelines config to drop the vulnerable scanner messages.
  3. changed codec.
  4. In jvm.options increased the intial and max heap size to 8GB. (32GB linux server)
input {
 syslog {
  id => "idsyslog"
  host => "172.29.144.38"
  port => 10514
  type => "syslog"
  codec => plain {
                    charset => "ISO-8859-1"
            }
 }
}
filter {
 if [host][ip] == "(vulnerable scanner ip)" {
 drop{}
 }

Thank you @leandrojmp @Badger :smiling_face_with_three_hearts:

1 Like