Hi,
So I have been debugging the following server side exception while indexing:
Caused by: com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 middle byte 0x3f
at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@323865da; line: 1, column: 873]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1702) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:558) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3548) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3555) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeUtf8_2(UTF8StreamJsonParser.java:3329) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2513) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2465) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:315) ~[jackson-core-2.8.1.jar:2.8.1]
at org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:83) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:199) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.index.mapper.TextFieldMapper.parseCreateField(TextFieldMapper.java:379) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:286) ~[elasticsearch-5.0.1.jar:5.0.1]
... 39 more
I have looked at a few other posts with the obvious response of...you should not be sending invalid UTF-8 characters. The trouble is I am not sure how I am getting these invalid UTF-8 characters. The DB character set is UTF-8 and I am reading and sending a field from the database.
It may be that there invalid UTF-8 characters (I'm still debugging) but I was wondering if the ES5.2 RestClient should be setting content type. I'm not sure exactly if the content type header is automatically set when I call RestClient>>performRequest.
So my question is when using the RestClient should the content type header always be set when sending requests to the server? e.g.
new BasicHeader(HTTP.CONTENT_TYPE, "application/json; charset=UTF-8");
I have set it and it does not seem to do any harm but it has not helped my encoding problem either. Anyway should it be set?
Cheers,
Stuart