ES5.2 RestClient: Content type?

Hi,

So I have been debugging the following server side exception while indexing:

Caused by: com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 middle byte 0x3f
at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@323865da; line: 1, column: 873]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1702) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:558) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3548) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3555) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeUtf8_2(UTF8StreamJsonParser.java:3329) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2513) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2465) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:315) ~[jackson-core-2.8.1.jar:2.8.1]
at org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:83) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:199) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.index.mapper.TextFieldMapper.parseCreateField(TextFieldMapper.java:379) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:286) ~[elasticsearch-5.0.1.jar:5.0.1]
... 39 more

I have looked at a few other posts with the obvious response of...you should not be sending invalid UTF-8 characters. The trouble is I am not sure how I am getting these invalid UTF-8 characters. The DB character set is UTF-8 and I am reading and sending a field from the database.

It may be that there invalid UTF-8 characters (I'm still debugging) but I was wondering if the ES5.2 RestClient should be setting content type. I'm not sure exactly if the content type header is automatically set when I call RestClient>>performRequest.

So my question is when using the RestClient should the content type header always be set when sending requests to the server? e.g.

new BasicHeader(HTTP.CONTENT_TYPE, "application/json; charset=UTF-8");

I have set it and it does not seem to do any harm but it has not helped my encoding problem either. Anyway should it be set?

Cheers,

Stuart

Does ElasticSearch assume a certain character set for index and search requests? I read that it assumes JSON content but I cannot find any information about character set.

As far as I can tell the text I am reading from the database is valid UTF8 so something is happening on the sending (RestClient) or receiving side (ElasticSearch).

Ok, so I have solved this as follows (for RestClient):

ContentType ct = ContentType.APPLICATION_JSON.withCharset(Charset.forName("UTF-8"))
HttpEntity requestBody = new StringEntity(jsonMapping, ct);

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.