I'm using Kibana/ES 6.5 to index around 470 documents crawled from a Hebrew website with:
POST /index_name/_doc/_bulk
I can post/get few individual documents but some probably carry some illegal characters as when I'm posting all I'm getting the following error message:
{
"error": {
"root_cause": [
{
"type": "json_parse_exception",
"reason": "Unexpected character ('×' (code 215)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@138e10d; line: 1, column: 2]"
}
],
"type": "json_parse_exception",
"reason": "Unexpected character ('×' (code 215)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@138e10d; line: 1, column: 2]"
},
"status": 500
}
Is there a way to find out which of the documents contain the characters that have led to this error?
The response to a bulk action is a large JSON structure with the individual results of each action that was performed in the same order as the actions that appeared in the request. The failure of a single action does not affect the remaining actions.
Thus if this error appears in the 4th entry of the items array in the response then it's the 4th document that has a problem.
Sorry, I just saw this. I think this indicates that the whole request was malformed, rather than any individual document. You must alternate lines like this:
Thanks. I've cleaned the documents of newlines so the only option is to search for it manually which is what I wanted to avoid. Regarding your first answer, I don't see any entry of items in the response, only line/column reference.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.