Case_insensitive terms query on _id field - Is this behavior a bug?

I have an index with IDs that I manually set to product IDs from the database, and they get prefixed with AHC or CAT, depending on which catalog the products are from. In my general search, I include a terms query that looks for an exact _id match and was asked if I could make it case-insensitive. While investigating this, I encountered some weird behavior I wanted to run past the group here. The index is using Elastic v7.17.0

  1. This query works, and returns exactly 1 result, as expected. Same if I remove case_insensitive:

GET /unified_item_index_alias/_search
{ "query": { "term": { "_id": { "value": "AHC1664515", "case_insensitive": false } } } } }– Returns 1 result

  1. The same query, with the AHC part of the query string lower-cased, returns no matches. That’s, expected I guess:

GET /unified_item_index_alias/_search
{ "query": { "term": { "_id": { "value": "ahc1664515", "case_insensitive": false } } } } }– Returns 0 results

  1. If I change case_insensitive to true, the same query that was working before doesn’t work at all (0 results for ahc… and AHC…)

GET /unified_item_index_alias/_search
{ "query": { "term": { "_id": { "value": "AHC1664515", "case_insensitive": true } } } } }– Returns 0 results

There’s no error, it just doesn’t return any results. I feel like this is some kind of bug or documentation fault. I’m pretty sure the work around is to either force upper case on the search string, or make another analyzed field in the index to search against, but the _id seems a natural choice, and no error regarding using case sensitive = false indicates to me it should just work.

I did an explain on this one, and I don’t really know how to interpret the results I got here:

GET /unified_item_index_alias/_explain/AHC1664515
{
    "query": {
      "term": { "_id": { "value": "AHC1664515", "case_insensitive": true  } } }
  }
}
::::>
{
  "_index" : "unified_item_index1",
  "_type" : "_doc",
  "_id" : "AHC1664515",
  "matched" : false,
  "explanation" : {
    "value" : 0.0,
    "description" : "Failure to meet condition(s) of required/prohibited clause(s)",
    "details" : [
      {
        "value" : 0.0,
        "description" : """no match on required clause (_id:AutomatonQuery {
org.apache.lucene.util.automaton.Automaton@18849921})""",
        "details" : [
          {
            "value" : 0.0,
            "description" : """_id:AutomatonQuery {
org.apache.lucene.util.automaton.Automaton@18849921} doesn't match id 1968195""",
            "details" : [ ]
          }
        ]
      },
      {
        "value" : 0.0,
        "description" : "match on required clause, product of:",
        "details" : [
          {
            "value" : 0.0,
            "description" : "# clause",
            "details" : [ ]
          },
          {
            "value" : 1.0,
            "description" : "DocValuesFieldExistsQuery [field=_primary_term]",
            "details" : [ ]
          }
        ]
      }
    ]
  }
}

Hello @khatcher

Welcome to the Community!!

Yes, I see the same behavior and this appears to be expected.

Found below information related to _id via LLM :
The _id field only supports exact, case-sensitive lookups. Enabling case_insensitive causes the query to be rewritten internally in a way that _id does not support, so no results are returned.

On documentation i do not see more details :

You are right, we can store the ID in a separate keyword field (duplicate of _id) to fulfill this request.

For workaround, we can query _id using a terms query with both case variants:

GET unified_item_index1/_search
{
  "query": {
    "terms": {
      "_id": [
        "AHC1664515",
        "ahc1664515"
      ]
    }
  }
}

Thanks!!