Encoding charset logstash accents log problem


(bus) #1

Hello ELK community :slight_smile:

I've got a problem, I listen a "Vega" log file thanks toFilebeat and then I transfer these logs into Logstash to grok them etc...

For exemple, a log line in my log file looks like :

24/05/2018 09:20:03.276 (EXTERNE) 27/02/2018/D T XX_ARCXXX_X_TN_ARC011_PASSAGE_BATCH(922)/SAU_XXXXXX_X_LL_OPE001_EFLUID DEB STATUS : EC La tâche est en exécution sur l'agent XX_XXXXXX_X_AG_LIN001_LVBATMLD2 (20920) : LANPRO N° 632415, ATTENTE DE 6 seconde(s)

According to Notepad++, this .vld log file is encoding in "ANSI" :
Sans%20titre

So that's why my logstash pipeline looks like :

### INPUT SECTION ###
input
{
  beats
  {
    port => 5044
    codec => plain {
                charset => 'ANSI_X3.4-1968'
                   }
  }
}



### FILTER SECTION ###
filter
{
}

### OUTPUT SECTION ###
output
{
  elasticsearch
  {
    hosts => "http://localhost:9200"
    index => "vega"
    codec => plain {
                charset => 'ANSI_X3.4-1968'
                   }
  }
}

So, I decide to put charset 'ANSI_X3.4-1968' (see Plain Codec Plugin)

But into Kibana, my logs looks like :

24/05/2018 09:20:03.276 (EXTERNE) 27/02/2018/D T XX_ARCXXX_X_TN_ARC011_PASSAGE_BATCH(922)/SAU_XXXXXX_X_LL_OPE001_EFLUID DEB STATUS : EC La t�che est en ex�cution sur l'agent XX_XXXXXX_X_AG_LIN001_LVBATMLD2 (20920) : LANPRO N� 632415, ATTENTE DE 6 seconde(s)

problem of accents ...

Does anyone know where this problem comes from?

Thank you all :slight_smile:


(bus) #2

Nobody can help me ? :frowning:


(Magnus Bäck) #3

"ANSI encoding" is a misnomer. Try Windows-1252 instead. See https://stackoverflow.com/questions/701882/what-is-ansi-format.


(bus) #4

Have I to put "Windows-1252" in Input ? Output ? Or both ?

Thx for responses @magnusbaeck.


(Magnus Bäck) #5

Set the input's codec's charset to Windows-1252. The codec for the elasticsearch output should never be changed.


(bus) #6

I putted in input :

input
{
  beats
  {
    port => 5044
    codec => plain {
                charset => 'Windows-1252'
                   }
  }
}

Result :
28/05/2018 08:53:45.373 (TACHE) 27/02/2018/D T XX_CACXXX_X_TN_CAC001MT_PORTEFEUILLE(945)/SAU_XXXXXX_X_LL_OPE001_EFLUID TER STATUS : TN Terminaison normale de la t�che (TN EXIT CODE 0)

It does not seem to work


(Magnus Bäck) #7

Oh, right. You're using Filebeat. Probably better to apply the codec on the Filebeat side. Remove the codec setting for the beats input and change the prospector's encoding option (https://www.elastic.co/guide/en/beats/filebeat/master/filebeat-input-log.html#_literal_encoding_literal).


(bus) #8

@magnusbaeck,

I just add the line "encoding: Windows-1252" into my "filebeat.yml" file
And I removed the codec part into my input section.

So now :

28/05/2018 09:49:10.670 (TACHE) 11/01/2018/D T XX_ARCXXX_X_TN_ARC011_PASSAGE_BATCH(922)/CLO_XXXXXX_X_LL_OPE001_EFLUID TER STATUS : TA Terminaison anormale de la t�che (TA EXIT CODE 252)

:frowning:


(Magnus Bäck) #9

Okay, so maybe the source data isn't Windows-1252. I don't have any great suggestions then.


(bus) #10

Okay, so how could I find the correct codec of my input's data ?


(bus) #11

Rectification, it works!

I just forgot to restart the Filebeat service after adding the encoding line.

thank you for everything @magnusbaeck


(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.