ES5.2 RestClient: Content type?


#1

Hi,

So I have been debugging the following server side exception while indexing:

Caused by: com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 middle byte 0x3f
at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@323865da; line: 1, column: 873]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1702) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:558) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3548) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3555) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeUtf8_2(UTF8StreamJsonParser.java:3329) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2513) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2465) ~[jackson-core-2.8.1.jar:2.8.1]
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:315) ~[jackson-core-2.8.1.jar:2.8.1]
at org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:83) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.textOrNull(AbstractXContentParser.java:199) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.index.mapper.TextFieldMapper.parseCreateField(TextFieldMapper.java:379) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:286) ~[elasticsearch-5.0.1.jar:5.0.1]
... 39 more

I have looked at a few other posts with the obvious response of...you should not be sending invalid UTF-8 characters. The trouble is I am not sure how I am getting these invalid UTF-8 characters. The DB character set is UTF-8 and I am reading and sending a field from the database.

It may be that there invalid UTF-8 characters (I'm still debugging) but I was wondering if the ES5.2 RestClient should be setting content type. I'm not sure exactly if the content type header is automatically set when I call RestClient>>performRequest.

So my question is when using the RestClient should the content type header always be set when sending requests to the server? e.g.

new BasicHeader(HTTP.CONTENT_TYPE, "application/json; charset=UTF-8");

I have set it and it does not seem to do any harm but it has not helped my encoding problem either. Anyway should it be set?

Cheers,

Stuart


#2

Does ElasticSearch assume a certain character set for index and search requests? I read that it assumes JSON content but I cannot find any information about character set.

As far as I can tell the text I am reading from the database is valid UTF8 so something is happening on the sending (RestClient) or receiving side (ElasticSearch).


#3

Ok, so I have solved this as follows (for RestClient):

ContentType ct = ContentType.APPLICATION_JSON.withCharset(Charset.forName("UTF-8"))
HttpEntity requestBody = new StringEntity(jsonMapping, ct);

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.