Encoding charset logstash accents log problem

Hello ELK community :slight_smile:

I've got a problem, I listen a "Vega" log file thanks toFilebeat and then I transfer these logs into Logstash to grok them etc...

For exemple, a log line in my log file looks like :

24/05/2018 09:20:03.276 (EXTERNE) 27/02/2018/D T XX_ARCXXX_X_TN_ARC011_PASSAGE_BATCH(922)/SAU_XXXXXX_X_LL_OPE001_EFLUID DEB STATUS : EC La tâche est en exécution sur l'agent XX_XXXXXX_X_AG_LIN001_LVBATMLD2 (20920) : LANPRO N° 632415, ATTENTE DE 6 seconde(s)

According to Notepad++, this .vld log file is encoding in "ANSI" :
Sans%20titre

So that's why my logstash pipeline looks like :

### INPUT SECTION ###
input
{
  beats
  {
    port => 5044
    codec => plain {
                charset => 'ANSI_X3.4-1968'
                   }
  }
}



### FILTER SECTION ###
filter
{
}

### OUTPUT SECTION ###
output
{
  elasticsearch
  {
    hosts => "http://localhost:9200"
    index => "vega"
    codec => plain {
                charset => 'ANSI_X3.4-1968'
                   }
  }
}

So, I decide to put charset 'ANSI_X3.4-1968' (see Plain Codec Plugin)

But into Kibana, my logs looks like :

24/05/2018 09:20:03.276 (EXTERNE) 27/02/2018/D T XX_ARCXXX_X_TN_ARC011_PASSAGE_BATCH(922)/SAU_XXXXXX_X_LL_OPE001_EFLUID DEB STATUS : EC La t�che est en ex�cution sur l'agent XX_XXXXXX_X_AG_LIN001_LVBATMLD2 (20920) : LANPRO N� 632415, ATTENTE DE 6 seconde(s)

problem of accents ...

Does anyone know where this problem comes from?

Thank you all :slight_smile:

Nobody can help me ? :frowning:

"ANSI encoding" is a misnomer. Try Windows-1252 instead. See https://stackoverflow.com/questions/701882/what-is-ansi-format.

Have I to put "Windows-1252" in Input ? Output ? Or both ?

Thx for responses @magnusbaeck.

Set the input's codec's charset to Windows-1252. The codec for the elasticsearch output should never be changed.

I putted in input :

input
{
  beats
  {
    port => 5044
    codec => plain {
                charset => 'Windows-1252'
                   }
  }
}

Result :
28/05/2018 08:53:45.373 (TACHE) 27/02/2018/D T XX_CACXXX_X_TN_CAC001MT_PORTEFEUILLE(945)/SAU_XXXXXX_X_LL_OPE001_EFLUID TER STATUS : TN Terminaison normale de la t�che (TN EXIT CODE 0)

It does not seem to work

Oh, right. You're using Filebeat. Probably better to apply the codec on the Filebeat side. Remove the codec setting for the beats input and change the prospector's encoding option (https://www.elastic.co/guide/en/beats/filebeat/master/filebeat-input-log.html#_literal_encoding_literal).

@magnusbaeck,

I just add the line "encoding: Windows-1252" into my "filebeat.yml" file
And I removed the codec part into my input section.

So now :

28/05/2018 09:49:10.670 (TACHE) 11/01/2018/D T XX_ARCXXX_X_TN_ARC011_PASSAGE_BATCH(922)/CLO_XXXXXX_X_LL_OPE001_EFLUID TER STATUS : TA Terminaison anormale de la t�che (TA EXIT CODE 252)

:frowning:

Okay, so maybe the source data isn't Windows-1252. I don't have any great suggestions then.

Okay, so how could I find the correct codec of my input's data ?

Rectification, it works!

I just forgot to restart the Filebeat service after adding the encoding line.

thank you for everything @magnusbaeck

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.