Warn : expected UTF-8


(marwa) #1

i wanted to load data from logsatsh into elasticsearch , this is a sample of my data

product-name,price,number_of_customers,general_feddback,promotion_period,percentage_promotion,number_of_customers_after_promotion,categorie_produit
Taillefine aux fruits 0% Fraise - Danone - 500 g (4 x 125 g),556.26,481018,3,10-10-2016,22,516292.653333333,yaourt
Activia Fibre - Danone - 171,5 g (150 g+21,5 g),612.98,961714,2,4-8-2016,32,1064296.82666667,yaourt
Taillefine Yaourt Nature - Danone - 1,5 kg e (12 * 125 g),254.07,845922,4,11-4-2016,39,955891.86,yaourt
Activia Noix de Coco - Danone - 650 g,256.19,497515,1,1-14-2016,25,538974.583333333,yaourt
Danio Raspberry - danone - 160g,312.67,581307,5,3-17-2017,17,614247.73,yaourt
Fjørd nature - Danone - 500 g, 4 pots de 125 g,425.59,878459,4,8-26-2016,26,954592.113333333,yaourt

this the config file
input {
file {
path => "/opt/logstash-5.5.0/data/es_data_base.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}

filter {
csv {
separator => ","
#product-name,price,number_of_customers,general_feddback,promotion_period,percentage_promotion,number_of_customers_after_promotion,categorie_produit
columns => ["product-name","price","number_of_customers","general_feddback","promotion_period","percentage_promotion","number_of_customers_after_promotion","categorie_produit"]
}
}

output {
elasticsearch {
hosts => "http://197.12.8.3:9200"
index => "es_retails"
}
stdout {}
}

this warn appeared :

Received an event that has a different character encoding than you configured. {:text=>"Ketchup - Jardin Bio' - 560\xA0g,2784.03,18174,3,1-7-2016,26,3025.3126,salami_viande", :expected_charset=>"UTF-8"}

how to fix it !!! any help please


(Magnus Bäck) #2

Find out what character set your data is in and set the charset option like this:

input {
  file {
    ...
    codec => plain {
      charset => "name of charset here"
    }
  }
}

See https://www.elastic.co/guide/en/logstash/current/plugins-codecs-plain.html for a list of available character sets.


(marwa) #3

i did used ISO-8859-1 and everything worked very well but i still don't undestand what does it means can you explain it for me ??


(Magnus Bäck) #4

You configured Logstash to interpret the input file as ISO8859-1 instead of UTF-8, and then the bits and bytes of the file make sense.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.