Message Parsing

Here is what my logstash output looks like:

output {
  elasticsearch {
    index => "%{[@metadata][beat]}"
    hosts => "192.168.0.103"
  }
}

Logstash errors are as follows:

[2020-11-18T13:08:12,824][WARN ][logstash.codecs.jsonlines][main][26da92079e525d4bfdac5a892ff28079c6695bd768a516e8a992f0d588033c05] Received an event that has a different character encoding than you configured. {:text=>"\\u000E\\x97P]...
 [2020-11-18T13:08:12,826][WARN ][logstash.codecs.jsonlines][main][26da92079e525d4bfdac5a892ff28079c6695bd768a516e8a992f0d588033c05] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unrecognized token 'z': was expecting ('true', 'false' or 'null')
 at [Source: (String)"z -9\x92\u0001~\u0000/\f\x960l...

I added

codec => plain {
      charset => "ISO-8859-1"
    }

But am getting similar error messages:

JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected character...
Received an event that has a different character encoding than you configured. {:text=>"\\xB6\\xA6}e#\\x

Hi,

I don't think it has something to do with the output - I guess it is connected to the input:

JSON parse error, original data now in message field

output plugins do not parse the data - they serialize it. Can you show us the complete pipeline?

input {
    tcp {
    port => 5044
    codec => json
    }
}

filter {
  date {
    match => [ "timeMillis", "UNIX_MS" ]
  }
}

output {
  elasticsearch {
    index => "%{[@metadata][beat]}"
    hosts => "192.168.0.103"
    codec => plain {
      charset => "ISO-8859-1"
    }
  }
}

Where do you get your data from? According to the character pages:
\xB6

The pilcrow , , also called the paragraph mark , paragraph sign , paraph , alinea , or blind P , is a typographical character marking the start of a paragraph.
](Pilcrow - Wikipedia)

\xA6

The vertical bar , | , is a glyph with various uses in mathematics, computing, and typography. It has many names, often related to particular meanings: Sheffer stroke (in logic), verti-bar , vbar , stick , vertical line , vertical slash, bar , pike , or pipe , and several variants on these names. It is occasionally considered an allograph of broken bar (see below).

In what character encoding do you receive the data - have you tried setting the encoding on the input instead of the output?

Best regards
Wolfram

I am sending logs from one of my Linux clients. Here is part of the filebeat.yml from that client:

- type: log
  # Change to true to enable this input configuration.
  enabled: true
  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/*.log

Are you sure that the logs are in json format? That would explain the json codec errors...