Character encoding problems Filebeat & Logstash

Hello,

I've read many topics in about this problem but it's still not resolved for me.

I have a fileabeat 6.2.2 installed on Windows, it will send logs logs to a Logstash 6.2.2 installed on Linux.

This my filebeat config:

- type: log
  scan_freqency: 10s  
  paths:
     - C:\logs\example.log
  encoding: utf-8

Logstash config:

input {
  beats {
    port => 5044
    codec => json { charset => "UTF-8" }
  }  
}

    output {
      http {
        automatic_retries => 10
        content_type => "application/json"
        format => "json"
        http_method => "post"
        ignorable_codes => 409
        keepalive => true
        url => "http://localhost:9090"
      }
      
      file {
        path => "D:\\result.txt"
    	codec => json { charset => "UTF-8" }
      }
    }

Logs are :

2018-09-26 16:05 - First: é, Second: è
2018-09-26 16:06 - Third: à, Fourth: â

Results in my web application (linked to a database) and in the file D:\result.txt:

2018-09-26 16:05 - First: ├®, Second: ├¿
2018-09-26 16:06 - Third: à, Fourth: â

Results from the Console (Ruby debug output) :

2018-09-26 16:05 - First: é, Second: è
2018-09-26 16:06 - Third: à, Fourth: â

I tried many encoding charsets following theses links :
https://www.elastic.co/guide/en/logstash/current/plugins-codecs-json.html
https://www.elastic.co/guide/en/logstash/current/plugins-codecs-plain.html

But I always get the same issue.

I also changed configuration :

  • removed encoding: utf-8 from filebeat configuration
  • removed codec => json{ charset => "UTF-8" } from Logstash config

And nothing worked..

Can anyone help on this ? Thanks

Maybe the windows files are in a Microsoft encoding?

I can't say for filebeat, but charset setting in a codec is a from setting, meaning that, say you have a file in CP1252 encoding (Windows) and Logstash/Elasticsearch must have and expects UTF8 then you set the charset setting to "CP1252".

Here you are saying, "I know I have X encoding so please convert it to UTF8".

A few people have tried "universal string encoding detection" of an arbitrary piece of text but most have failed because the confidence level of the detection is a function of string length and the number of occurrences of multi-byte sequences.

So Logstash does not know what the source charset of the input data is. You can try ASCII_8BIT because then LS will force encode to UTF8 and will replace any illegal UTF8 sequences with a � character.

Thank you for the reply,

Yes, the Windows file is in ANSI encoding.
I had to set encoding to ANSI_X3.4-1968:

filebeat.prospectors:
- type: log
  enabled: true
  encoding: ANSI_X3.4-1968
  paths:
    - C:\logs\example.log

PS: to people using other components to (enrich or store data), you need to also configure the message default converter.

For example in a Spring Web application, if we need to interact with Logstash, we usually use RestTemplate

All we need to do is :to set UTF-8 as the default charset :

restTemplate.getMessageConverters().add(0, new StringHttpMessageConverter(Charset.forName("UTF-8")));

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.