List of characters that must be escaped?

Matthew_Allan · December 5, 2018, 10:27pm

Hello,

While indexing some user provided content I have come across some unicode characters that cannot be indexed without escaping them first. For example:

DELETE /my_index

PUT /my_index
{
  "mappings": {
    "thing": {
      "properties": {
        "content": {
          "type": "text"
        }
      }
    }
  }
}

POST /my_index/thing/1
{
  "content": "hello 	  	 "
}

This results in the following error on 5.6:

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse [content]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse [content]",
    "caused_by": {
      "type": "json_parse_exception",
      "reason": "Illegal unquoted character ((CTRL-CHAR, code 9)): has to be escaped using backslash to be included in string value\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@6a67109; line: 2, column: 22]"
    }
  },
  "status": 400
}

The example above is using the unicode character Information Separator Three. Is there a list anywhere of which characters must be escaped before indexing? It doesn't seen to include entire unicode 'other, control character' as I can index U+0080 without issue.

system · January 2, 2019, 10:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@6eb3d4e; line: 3, column: 206]1 Elasticsearch	2	3916	December 28, 2019
JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 9)): has to be escaped Elasticsearch	3	39783	July 6, 2017
Getting error while migrating indexes from 5.6 to 6.3.0 Elasticsearch	1	424	April 4, 2019
Escaped '\' not respected? Elasticsearch	1	921	July 6, 2017
QueryParsingException With Certain Characters Elasticsearch	2	262	July 6, 2017

List of characters that must be escaped?

Related topics