We're getting some unknown character returned on queries


(James Peppercorn ) #1

On one of our indices, when we search it, is returning an odd character that breaks json parsing. In Dev Tools it shows as a red dot between "_source": and {"id Also, Dev Tools results are not json formatted, just plain text. Reindex does not fix this. Has anyone else encountered this, and know what this is and/or how to fix, prevent?

Capture

Elasticsearch 6.3.2
RHEL 7


(David Turner) #2

Elasticsearch preserves the _source that it was given completely verbatim, including any leading and trailing space. I don't know what kind of space this might be, but I guess it was included when the document was indexed. It looks like Elasticsearch accepts leading tabs, newlines, carriage returns and space characters (as per the JSON spec) so I am going to guess it's one of those.

I would try and find out what character it is by running your search using curl and piping the output to xxd. For instance, I indexed a doc with a leading tab character (0x09) and trailing newline (0x0a) and this was the result:

$ curl -s 'http://localhost:9200/_search' | xxd
00000000: 7b22 746f 6f6b 223a 322c 2274 696d 6564  {"took":2,"timed
00000010: 5f6f 7574 223a 6661 6c73 652c 225f 7368  _out":false,"_sh
00000020: 6172 6473 223a 7b22 746f 7461 6c22 3a31  ards":{"total":1
00000030: 2c22 7375 6363 6573 7366 756c 223a 312c  ,"successful":1,
00000040: 2273 6b69 7070 6564 223a 302c 2266 6169  "skipped":0,"fai
00000050: 6c65 6422 3a30 7d2c 2268 6974 7322 3a7b  led":0},"hits":{
00000060: 2274 6f74 616c 223a 312c 226d 6178 5f73  "total":1,"max_s
00000070: 636f 7265 223a 312e 302c 2268 6974 7322  core":1.0,"hits"
00000080: 3a5b 7b22 5f69 6e64 6578 223a 2269 222c  :[{"_index":"i",
00000090: 225f 7479 7065 223a 225f 646f 6322 2c22  "_type":"_doc","
000000a0: 5f69 6422 3a22 6c34 3147 5847 6342 6836  _id":"l41GXGcBh6
000000b0: 4175 6e35 426b 724d 736d 222c 225f 7363  Aun5BkrMsm","_sc
000000c0: 6f72 6522 3a31 2e30 2c22 5f73 6f75 7263  ore":1.0,"_sourc
000000d0: 6522 3a09 7b7d 0a7d 5d7d 7d              e":.{}.}]}}
                 ^^      ^^ NB tab and newline characters in output

Then I would check to see if your JSON parser can cope with this. It should, it's allowed by the spec, but if it doesn't then you may have to overwrite this document in Elasticsearch.