Logstash errors on imap input multiple languages

I am working the the imap plugin and all seems to work well until an email with non english characters comes through. Logstash then fails then restarts.

The error I am seeing is

[2018-01-22T10:00:40,212][ERROR][logstash.pipeline        ] A plugin had an unrecoverable error. Will 
restart this plugin.
Pipeline_id:main
Plugin: <LogStash::Inputs::IMAP host etc....
Error: 45
Exception: Java::JavaLang::ArrayIndexOutOfBoundsException
Stack: org.jcodings.transcode.TranscodeFunctions.funSioFromGB18030(TranscodeFunctions.java:423)

How can I resolve this with not really knowing all the types of characters that will be coming into the mailbox?

TIA

I found more details in the logs

  [WARN ][logstash.codecs.plain ] Received an event that has a different character encoding than you 
  configured. {:text=>"<html>\\n<head>\\n<meta http-equiv=\\\"Content-Type\\\" 
  content=\\\"text/html; charset=Windows-1252\\\">\\n<meta content=\\\"text/html; charset=us-
  ascii\\\">

So I assume my emails are coming in using UTF-8, Windows-1252 and us-ascii.

How would I go about having all of these able to be parsed?

I tried entering multiple charset types in my configuration, but I got errors after I added more than 1

Hi,

What characters are seen via the email bodies that could be thrown up as non English characters? Can you share some examples.
There are a variety of charsets, so I wonder about the current input characters and if the right charset may resolve here: https://www.elastic.co/guide/en/logstash/current/plugins-codecs-plain.html

thanks for the reply, the charsets from what I can tell are

"Windows-1258" and "UTF-8"

I will try and grab some samples.

Is it possible to have both charsets enabled for the imap plugin?

we have gone with an external python program to parse out the messages and send it to logstash

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.