ES5.2 RestClient: Content type?


So I have been debugging the following server side exception while indexing:

Caused by: com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 middle byte 0x3f
at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@323865da; line: 1, column: 873]
at com.fasterxml.jackson.core.JsonParser._constructError( ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError( ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther( ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther( ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeUtf8_2( ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2( ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString( ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText( ~[jackson-core-2.8.1.jar:2.8.1]
at org.elasticsearch.common.xcontent.json.JsonXContentParser.text( ~[elasticsearch-5.0.1.jar:5.0.1]
at ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.index.mapper.TextFieldMapper.parseCreateField( ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.index.mapper.FieldMapper.parse( ~[elasticsearch-5.0.1.jar:5.0.1]
... 39 more

I have looked at a few other posts with the obvious response should not be sending invalid UTF-8 characters. The trouble is I am not sure how I am getting these invalid UTF-8 characters. The DB character set is UTF-8 and I am reading and sending a field from the database.

It may be that there invalid UTF-8 characters (I'm still debugging) but I was wondering if the ES5.2 RestClient should be setting content type. I'm not sure exactly if the content type header is automatically set when I call RestClient>>performRequest.

So my question is when using the RestClient should the content type header always be set when sending requests to the server? e.g.

new BasicHeader(HTTP.CONTENT_TYPE, "application/json; charset=UTF-8");

I have set it and it does not seem to do any harm but it has not helped my encoding problem either. Anyway should it be set?



Does ElasticSearch assume a certain character set for index and search requests? I read that it assumes JSON content but I cannot find any information about character set.

As far as I can tell the text I am reading from the database is valid UTF8 so something is happening on the sending (RestClient) or receiving side (ElasticSearch).

Ok, so I have solved this as follows (for RestClient):

ContentType ct = ContentType.APPLICATION_JSON.withCharset(Charset.forName("UTF-8"))
HttpEntity requestBody = new StringEntity(jsonMapping, ct);

