Prefix of special character on the id

I have inadvertently loaded some documents without sanitising the field that was meant to be the id.

So some records now have an id that begins with the ascii 09 i.e. the tab character

I can extract an individual record with
GET /nyy_uprn/_doc/%09100040171995
returns the following

{
  "_index" : "xxxxxxx",
  "_type" : "_doc",
  "_id" : """	100040171995""",
  "_version" : 1,
  "_seq_no" : 56576,
  "_primary_term" : 3,
  "found" : true,
  "_source" : {
    "id" : """	100040171995""",

You can see both the _id and id both contain the char 09 and is escaped with double double-quotes.

However I cannot seem to retrieve these records using a prefix query
I have tried the following

GET /nyy_uprn/_search
{
  "query": {
    "prefix": {
      "id": "\t"
    }
  }
}

and have also tried many variations along the forms

""" """
"\\t"
"//t"
"/t"

I always get 0 records back. I'm not sure what is the problem here, as the tab character as valid json \t should work?

Hi Does anyone have any ideas here?

You would need to specify the _id field instead of id. Maybe a script filter searching for ids starting with a tab helps...

GET test/_search
{
  "query": {
    "script": {
      "script": "doc['_id'].value.startsWith('\t')"
    }
  }
}

note: that one might be super slow depending on the amount of documents.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.