Superscript text in data

Hi,

I was trying to parse data to Elasticsearch using Logstash and got the following error message:

Received an event that has a different character encoding than you configured. {:text=>"H\\tC\\tFCE5C9C1-CF1F-4593-9C23-B2D2F886DDC9\\tApex Matting & Foodservice Products\\tF3F7DFA6-BE0D-DD11-A23A-00304834A8C9\\t170S0035BD\\tFloor Mat, Carpet\\tOrientax\\tChicago\\tIL\\t60638\\t170 Orientrax\\x99 Nylon Mat, 3' x 5', twisted nylon fiber for moisture absorption,
anti-slip backing, oriental design, burgundy\\t180.18\\t0.0\\t \\t0.0\\t \\t \\t1\\tea\\t \\t0.0\\t9.0\\t \\t0.0\\t60.0\\t36.0\\ttrue\\t0\\tfalse\\tfalse\\t29854E73-49D6-409A-8A48-A9264FAE5703\\t \\t \\t \\t \\
t \\t \\t \\tfalse\\tlistPrice\\r", :expected_charset=>"UTF-8"}

Is it possible that the error is because I am passing Orientrax™ as one of the fields which I have mapped in Elasticsearch as "keyword" type? Does keyword data type not accept superscripts? If so, how may I fix this?

Thanks in advance.

This error message comes from Logstash and not Elasticsearch. You're sending data that isn't UTF-8. Specifically, the ™ is in your case represented as hexadecimal 99 (decimal 153) which indicates that your data isn't UTF-8 but CP-1252. Either reconfigure Logstash to expect CP-1252 (you can reconfigure your input's codec) or change the data so it conforms to UTF-8.

Thanks for your response. I changed the character encoding in the "codec" input plugin to "CP1252" and as expected, it only pushed data which conformed to CP1252. However, I still need the rest of the data (which have plain UTF-8 encoding) to go through. Is it possible to allow data that conforms to both types?

No, I don't think you can set a fallback character set.

OK I was wrong previously. Adding the encoding "CP1252" allows UTF-8 data and CP1252 to get pushed. Therefore, it works perfectly now. Thank you! For anyone else's reference, my input config looks like this (Windows machine):

input {
	file {
		path => ["<your file path>"]
		start_position => "beginning"
		sincedb_path => "NUL"
		codec => plain {
			charset => "CP1252"
        }
	}
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.