Filebeat - received an event - has different character encoding


#1

Hello,

I am using filebeat (Linux box) to send logs to ELK - logstash., It is not sending UTF-8 (Linux) encoding. Looks like hexa-decimal. Help please.

error message in logstash.log:

{:timestamp=>"2016-01-29T11:47:40.722000-0500", :message=>"Received an event that has a different character encoding than you configured.", :text=>"\u0014\u0006X\u0010E\u001D@B\xCA\u0010\x9D\xED{_=z\xF8r2\x99\x9C\xD1\xD7\xEDaaU\xC4\u0016#B\xBD\x87\x8D\xC2\b5tڨ\xC3o", :expected_charset=>"UTF-8", :level=>:warn}

Message in elasticsearch:

message:]\u0013\xA7X\xF8#\xAA\u001D>T\xF1߹\u0014\u0006X\u0010E\u001D@B\xCA\u0010\x9D\xED{_=z\xF8r2\x99\x9C\xD1\xD7\xEDaaU\xC4\u0016#B\xBD\x87\x8D\xC2\b5tڨ\xC3o

--
config

filebeat ver: filebeat-1.0.1-1.x86_64 (on linux)

logstash.conf - configured for plain/UTF-8.
(setting the codec/charset -makes no difference)

input {
tcp {

        type => "dev-mail4"
         port => 3570
            codec => plain {
                charset => "UTF-8"
        }
}

}


(ruflin) #2

What is the encoding of the file you are reading?


(Steffen Siering) #3

filebeat -> logstash uses json and therefore requires to encode content into UTF-8 on sender side. This hex-like dump results from raw-content being read which is clearly not UTF-8. What's the encoding of the file you are trying to send?


#4

Linux syslog, and maillogs..

filebeat.yml:
encoding: plain (tried UTF-8 same result)
document_type: log


(Steffen Siering) #5

plain and utf-8 are basically the same when forwarding to logstash. Difference is the utf-8 encoding applies the hex-encodings of unknown characters at read time.


(Steffen Siering) #6

Can you share some content of original log-files with us for testing? Some original file is required. Just copy'n pasting in forum will kinda 'fix' the encoding.


(Athreya Vc) #7

Hi,

I am facing the same issue with the filebeats+logstash, for postfix maillog I am facing the same error.

On the filebeat, I setup encoding to UTF-8. On the logstash side,

input {
  beats {
    port => 5044
    type => "maillog"
    codec => {
                 charset => "UTF-8" }
  }
  tcp {
        port => 5544
      }
}

Regards,
A


(Steffen Siering) #8
  1. Use the encoding in filebeat, not in logstash.

  2. both, filebeat and logstash assume input to be UTF-8 and setting charset to UTF-8 is mostly a simple copy of your input data (masking invalid characters). The encoding must match the file encoding you read from.

  3. looks like some kind of misconfiguration, but without getting some original logs (raw file) reproducing the error I have a hard time seeing what the fault is.

This happens for every single log line? If so, can you send a test mail and share the single log line in a separate file for testing?


(Athreya Vc) #9

Hi,

I did enable the encoding on the filebeat

filebeat:

List of prospectors to fetch data.

prospectors:
# Each - is a prospector. Below are the prospector specific configurations
-
# Paths that should be crawled and fetched. Glob based paths.
# To fetch all ".log" files from a specific level of subdirectories
# /var/log//.log can be used.
# For each file found under this path, a harvester is started.
# Make sure not file is defined twice as this can lead to unexpected behaviour.
paths:
- /var/log/maillog
#- c:\programdata\elasticsearch\logs*

  # Configure the file encoding for reading files with international characters
  # following the W3C recommendation for HTML5 (http://www.w3.org/TR/encoding).
  # Some sample encodings:
  #   plain, utf-8, utf-16be-bom, utf-16be, utf-16le, big5, gb18030, gbk,
  #    hz-gb-2312, euc-kr, euc-jp, iso-2022-jp, shift-jis, ...
  encoding: utf-8

I get some output in debug mode, but the message is "message" => "3\xB2\xE1m\u000F\xE0\x92J,"

like this

Regards,

A


(Athreya Vc) #10

This is my filebeat.yml

filebeat:
  prospectors:
    -
      paths:
        - /var/log/maillog
      input_type: log
      encoding: "utf-8"
      document_type: my_log
output:
  logstash:
    hosts: ["192.168.1.149:5544"]
  console:
    pretty: true
shipper:
logging:
  files:
    rotateeverybytes: 10485760 # = 10MB

This is my logstash.conf

input {
        beats {
        port => 5044
        codec => plain { charset => "UTF-8" }
        type => "maillog"
     }
    tcp {
         port => 5544
      }
}
filter {
  if [type] == "maillog" {
    grok {
      match => { "message" => ["%{PF}", "%{DOVECOT}" ] }
    }
    date {
      match => [ "timestamp", "MMM dd HH:mm:ss", "MMM d HH:mm:ss" ]
    }
  }
  # I wanted to monitor metrics and health of logstash
  metrics {
    meter => "events"
    add_tag => "metric"
  }
}

output {
        stdout { codec => rubydebug}
    }

Regards,

A


(Steffen Siering) #11

Encoding must match the content in the log file. Is the file/line really utf-8?

I really need a sample for reproducing this problem to help. PM me, if you want to arrange for sending me some data in private (+ gpg-encrypted).


(Athreya Vc) #12

Sure, let me check that.

file -bi /var/log/maillog
text/plain; charset=us-ascii

This is what I see.

Regards,

A


(Steffen Siering) #14

us-ascii should be fully covered by utf-8. Are all lines garbled?


(Athreya Vc) #15

Yes All the lines in the "Message" are Garbled.


(Steffen Siering) #16

I can not reproduce the problem by copying the content from given link.

Let's try having filebeat printing to console only for testing first:

filebeat:
  prospectors:
    -
      paths:
        - /var/log/maillog
      input_type: log
      document_type: my_log
output:
  console:
    pretty: true
shipper:

run filebeat in console with:

$ filebeat -e -c test.yml

(Athreya Vc) #17

filebeat -e -c test.yml

Works fine


(Athreya Vc) #18
  "@timestamp": "2016-03-10T16:51:50.404Z",
  "beat": {
    "hostname": mailserver.testlab.com,
    "name": mailserver.testlab.com
  },
  "count": 1,
  "fields": null,
  "input_type": "log",
  "message": "Mar 10 09:23:03 mailserver postfix/qmgr[28149]: 625EEC05FE: from=\u003cno-reply@testlab.com\u003e, size=1439, nrcpt=1 (queue active)",
  "offset": 3605648,
  "source": "/var/log/maillog",
  "type": "my_log"
}

(ruflin) #19

If the output to console works it looks more like a problem on the logstash side. Do you have any chance to try sending the log files directly to elasticsearch and check if they are there in the right format?


(Athreya Vc) #20

I will try and update.

Regards,

A


(Steffen Siering) #21

with console being fine, let's try most minimal filebeat -> logstash:

test2.yml:

filebeat:
  prospectors:
    -
      paths:
        - /var/log/maillog
      input_type: log
      document_type: my_log
output:
  logstash:
    hosts: ["localhost:5044"]

logstash.test.conf

input {
  beats {
    port => 5044
  }
}

output {
        stdout { codec => rubydebug}
}

run logstash with:

$ logstash agent -f logstash.test.conf

and filebeat with:

$ filebeat -e -c test2.yml