[Logstash OSS] Invalid UTF-8 start byte issue

Describe the bug
It's not possible to save item with non-ASCII characters into OpenSearch

To Reproduce
Steps to reproduce the behavior:

  1. Run OpenSearch in a Docker container:

docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "plugins.security.disabled=true" opensearchproject/opensearch:latest

  1. Setup and install logstash-oss-8-5-2 (Windows)
  2. Install logstash-output-opensearch plugin:

<path/to/your/logstash/dir>/bin/logstash-plugin install --version 2.0.0 logstash-output-opensearch

  1. Use below sample code to run the logstash, save file as logstash-example.conf
input {
    stdin { } 
}

filter {
# if you remove letter 'ß' error will dissapear
    mutate { add_field => { "name" =>  "Groß" } }    
    prune { whitelist_names => [ "^name$" ] }
}

output {
    opensearch {
        hosts => ["localhost:9200"]
        auth_type => {
            type => 'basic'
            user => 'admin'
            password => 'admin'
        }
        index => "test_index"
        action => "index"
    }
}
  1. Run the logstash as:

<path/to/your/logstash/dir>/bin/logstash -f logstash-example.conf

  1. Type any text into std input, press enter
  2. See the error:

"status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse", "caused_by"=>{"type"=>"json_parse_exception", "reason"=>"Invalid UTF-8 start byte 0xa0\n at [Source: (byte)"{"event":{"original":"\r"},"message":"\r","@timestamp":"2023-01-19T11:07:14.447970Z","name":"Gro�","host":{"hostname":"DESKTOP-SP31NNN"},"@Version":"1"}"; line: 1, column: 98]

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

Here I provided more details when investigated the issue [BUG] Invalid UTF-8 start byte issue · Issue #187 · opensearch-project/logstash-output-opensearch · GitHub

Hi @dstepanov25 You are in the wrong community, this community forum for Elasticsearch and does not support Opensearch nor the Opensearch Logstash output plugin so we can not help with that. I suspect you need to visit the Opensearch forum.

Can you replicate the issue using Logstash-OSS and the elasticsearch output pointing to a Elasticsearch cluster?

If you cannot replicate the issue with the above configuration, then the issue may be in the logstash-output-opensearch, which is not developed by Elastic and you will need to check this in the Opensearch forum.

It's working normally on the LS standard version 8.5.3

filter {
    mutate { add_field => { "name1" =>  "Groß" } }     
    mutate { add_field => { "name2" =>  "Groß" } }     
    mutate { gsub => ["name2","ß","ößü" ] }   
}

Result:

{
    "name2" => "Groößü",
    "name1" => "Groß"
}

Might be related to Docker and local settings, check here.
Can you check your locale settings:
localectl status

Hi, this issue also reproduced for Elasticsearch output plugin. I can change my examlpe

Yes, I can reproduce it for Elasticsearch as well.
I tried to execute the script for Elasticsearch 8.6.0 and received the same error.

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.6.0

...
output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "test_index"
        action => "index"
    }
}

I tried this scenario with Logstash-8-6-0 and with Logstash-OSS-8-6-0.
The issue is reproduced only with OSS version