Cannot delete documents with "!" in the _id

I have some records that I'm trying to either update or delete. All of them have an "!" in the _id field, something I'd have never done if I were here when it was created :slight_smile:

The _id looks like this: category-1047!09f483e6b26012

I can search for the document using curl and get a result like this:
curl -s es1:9200/versions-index/documents/_search?q=_id:category-1047\!09f483e6b26012

However, I CANNOT get the document when I try
curl -s es1:9200/versions-index/documents/category-1047!09f483e6b26012

I'd love to know how I can either 1) delete the document or 2) use the _bulk API to modify the document. I'm using Elastic 2.3.4 (yes, I know), and we're actually planning an upgrade soon. However, I have to modify these documents much sooner than the upgrade.

I suspect that the delete-by-query plugin might help me, as I can perform a working query. I'm trying to avoid that route if there's a core ES function that can get me to the result.

can you share the full curl output of the statement that fails? It seems to me, that the ! is rather escaped by your shell. Have you tried putting the full URL in single ticks and see if that works (also, please share the full curl -v output for any call). Thanks!

curl -v es1:9200/versions-index/documents/_search?pretty\&q=_id:jnsjir2019-fg-1455041\\!02fd15d1e2649c95027b8548f42c0e7d > es.out
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.15.19.174...
* TCP_NODELAY set
* Connected to es1 (10.15.19.174) port 9200 (#0)
> GET /versions-index/documents/_search?pretty&q=_id:jnsjir2019-fg-1455041\!02fd15d1e2649c95027b8548f42c0e7d HTTP/1.1
> Host: es1:9200
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 16701
<
{ [16701 bytes data]
100 16701  100 16701    0     0  5436k      0 --:--:-- --:--:-- --:--:-- 5436k
* Connection #0 to host es1 left intact

Above is the curl that reasonably works (using the query). Below are the results of that query.

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "versions-index-v2-20160313",
      "_type" : "documents",
      "_id" : "jnsjir2019-fg-1455041!02fd15d1e2649c95027b8548f42c0e7d",
      "_score" : 1.0,
      "_routing" : "JNSJIR2019-FG-1455041",
      "_source" : {
        "admin" : {
          "id" : "02fd15d1e2649c95027b8548f42c0e7d",
          "grouping_id" : "jnsjir2019-fg-1455041!02fd15d1e2649c95027b8548f42c0e7d",
          "an" : "JNSJIR2019-FG-1455041",
          "index_date" : "2019-01-09T18:18:17.074Z",
          "version" : 1546992000000,
          "recalled" : false
        },
        "date" : {
          "published" : "2019-01-09T00:00:00.000Z",
          "modified" : "2019-01-09T00:00:00.000Z"
        },
        "has_media" : {
          "abstract" : false,
          "audio" : false,
          "full_text" : true,
          "image" : true,
          "video" : false
        },
        "language" : {
          "iso6391" : [ "en" ],
          "iso6393" : [ "eng" ],
          "auto" : [ "eng" ],
          "freeform" : [ "English" ]
        },
        "publication" : {
          "value" : "John's Review",
          "volume" : "031/002"
        }
    },
    "doctype" : {
      "value" : "Article"
    },
    "format" : [ "html" ]
      }
    } ]
  }
}

While that query works, going just by ID does not:

curl -v es1:9200/versions-index/documents/jnsjir2019-fg-1455041\!02fd15d1e2649c95027b8548f42c0e7d?pretty > tmp.out
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.15.19.174...
* TCP_NODELAY set
* Connected to es1 (10.15.19.174) port 9200 (#0)
> GET /versions-index/documents/jnsjir2019-fg-1455041!02fd15d1e2649c95027b8548f42c0e7d?pretty HTTP/1.1
> Host: es1:9200
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: application/json; charset=UTF-8
< Content-Length: 135
<

response body is:

{ 
 "_index" : "versions-index-v2-20160313",
 "_type" : "documents",
 "_id" : jnsjir2019-fg-1455041!02fd15d1e2649c95027b8548f42c0e7d,
 "found" : false
}

Hey,

can you try this (escaping via ticks), as this seems to work for me under the zsh shell.

# curl -X DELETE localhost:9200/_all
{"acknowledged":true}

# curl 'localhost:9200/foo/_doc/with_!' -d '{ "key":"value" }' --header "Content-Type: application/json"
{"_index":"foo","_type":"_doc","_id":"with_!","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

# curl 'localhost:9200/foo/_doc/with_!'
{"_index":"foo","_type":"_doc","_id":"with_!","_version":1,"_seq_no":0,"_primary_term":1,"found":true,"_source":{ "key":"value" }}

Using either quotes or the apostrophe as you did return the same result. "found" : false. I'm using bash, for reference.

I have 2 clusters on two totally different server clusters with similar data. On one server cluster I can use the apostrophe and it's fine. On the other it's not (the reason for this ticket). I apologize for not sharing this earlier, but at the time of this writing I wasn't aware this worked on the second cluster.

I doubt it's a shell escape issue (could be, I just doubt it). I say this because if you look at the response from the second curl command I sent, it is displaying the _id it looked for, and it's returning the exact _id value that's there (as shown in my first curl response). The _id appears to be fully understood by Elastic--it just handles it incorrectly.

I want to say this is a bug in 2.3.5, which is EOL anyway, but I'd hope there's a documented patch in a later version if this is the case.

Also, I found in the meantime that I can use the _bulk endpoint and perform update and delete operations ok. With this knowledge, I'm able to do the job I needed in the first place. I'm just confused which this was and still is a problem.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.