Indexed documents with composite _id collide

Hi guys. I've found some interesting behaviour that I was hoping to get some clarification on. Let's say I perform a bulk index of two documents like so:

curl --request POST \
  --url <domain>/<index>/_bulk \
  --header 'Content-Type: application/json' \
  --data '{ "index" : { "_id": "aaa", "_type": "1" }}
{ "name": "Carlson Barnes", "age": 34}
{ "index" : { "_id": "aaa#bbb", "_type": "1" }}
{ "name": "Sheppard Stein","age": 39}
'

And now I go to retrieve the record aaa#bbb:

curl --request GET \
  --url '<domain>/<index>/1/aaa#bbb'

This replies with:

{
  "_index": "<index>",
  "_type": "1",
  "_id": "aaa",
  "_version": 2,
  "_seq_no": 2,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "name": "Carlson Barnes",
    "age": 34
  }
}

So it's actually matching on and returning aaa, instead of aaa#bbb. I believe that this is a problem when indexing rather and retrieving, since I originally noticed this behaviour when indexing multiple documents with composite _id fields and then doing an aggregation.

Can anyone explain why this happens, and if it's intended behaviour? My workaround at this stage is to perhaps create a hash to use for the _id field, and move the composite key into a separate field.

Thanks heaps!

Update: This mainly seems to be a problem with using a hash as a delimiter. Using a colon or underscore seems to work as expected.

Welcome to our community! :smiley: We aren't all guys though :slight_smile:

Yes, that's show with "_version": 2. AFAIK you can include these sorts of non-alpha chars in the _id but you need to escape them to work.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.