Error parsing CSV - Illegal quoting with non-ascii (default)

CSV Line: "Mañe"###"email@gmail.com"###"3563454"###"255.255.255.255"

:source=>"\\"Man\\xF1e\\"###\\"email@gmail.com\\"###\\"3563454\\"###\\"255.255.255.255\\"",:exception=># CSV::MalformedCSVError: Illegal quoting in line 1. }

Logstash config:
input {
stdin {
type => "csv"
}
}

filter {
csv {
columns => ["username","email","entry","ip"]
separator => "###"
remove_field => ["message","@timestamp","path","type","host"]
}
}

output {
elasticsearch {
template => "template.json"
action => "index"
hosts => ["192.168.1.19:9200"]
index => "data-2016"
template_overwrite=>true
codec => plain
{ charset => "ISO-8859-1" }
}
}

I added the codec as a method to change the way it handles the special characters. But this seems to happen with every entry that contains a non default ascii value.

examples: \xA3 \xB8, \xCE, \xE4, \xBC, \xE8

I want to add that I'm absolutely sure all fields having this issue are quoted correctly. Fields containing a " have been escaped with a ". i.e. "alpha"###"james@james.com"###"I am ""who I am"""###"255.255.255.255"

Resolved -

Solution:
file -bi
Showed me the file was using charset iso-8859-1

Adding the codec to the input solves this.

stdin {
type => "csv"
codec => plain { charset => "ISO-8859-1" }
}

This also means that the output codec can go back to the default UTF-8 and it will still work.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.