Hi guys. I've found some interesting behaviour that I was hoping to get some clarification on. Let's say I perform a bulk index of two documents like so:
curl --request POST \
--url <domain>/<index>/_bulk \
--header 'Content-Type: application/json' \
--data '{ "index" : { "_id": "aaa", "_type": "1" }}
{ "name": "Carlson Barnes", "age": 34}
{ "index" : { "_id": "aaa#bbb", "_type": "1" }}
{ "name": "Sheppard Stein","age": 39}
'
And now I go to retrieve the record aaa#bbb:
curl --request GET \
--url '<domain>/<index>/1/aaa#bbb'
This replies with:
{
"_index": "<index>",
"_type": "1",
"_id": "aaa",
"_version": 2,
"_seq_no": 2,
"_primary_term": 1,
"found": true,
"_source": {
"name": "Carlson Barnes",
"age": 34
}
}
So it's actually matching on and returning aaa, instead of aaa#bbb. I believe that this is a problem when indexing rather and retrieving, since I originally noticed this behaviour when indexing multiple documents with composite _id fields and then doing an aggregation.
Can anyone explain why this happens, and if it's intended behaviour? My workaround at this stage is to perhaps create a hash to use for the _id field, and move the composite key into a separate field.
Thanks heaps!
Update: This mainly seems to be a problem with using a hash as a delimiter. Using a colon or underscore seems to work as expected.