Requesting help with Case-insensitive Analyzer

Hello,
I have enabled a "lowercase" analyzer across all my indices, but I have run into an error while using it.
My query parameter:

"query": {
    "query_string": {
      "query": "username.keyword:\"Test\"",
      "analyzer": "case_insensitive_analyzer"
    }

My expected result: This should return all documents matching "Test" regardless of case permutations. e.g "test" and "TEST"
Actual result: It returns all documents matching only "test", in lowercase.

Some troubleshooting suggests to me, that the analyzer works correctly behind-the-scenes - that is, it's correctly searching in lowercase, regardless of what case I type it in.
However, since I indexed my data prior to adding this analyzer, I assume Elasticsearch does not recognize existing results with the value "TEST" as the same as the lower-cased "test", and they perhaps need to be re-indexed?

The analysis using GET _all/_settings returns this for all indices:

        "analysis": {
          "analyzer": {
            "case_insensitive_analyzer": {
              "filter": [
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "standard"
            },
            "default": {
              "filter": [
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "keyword"
            }
          }
        },

Please inform me if I've done something incorrect here - I am new to analyzers.
Otherwise, if correct, is there a way to "fix" my existing indices/documents to match this case-insensitive mapping?
I am importing large log files programmatically using an import script, so ideally I don't want to have to manually re-index everything.
I am running Elasticsearch 8.11.1

Edit: This analyzer also doesn't work at all when the query contains a space (it returns no results) nor punctuation (it ignores the punctuation entirely). I haven't the foggiest idea why that would be the case.

Thank you
Matthias

Hi @mvkfg .

I see that you use the keyword field, you have already tried using it only in the "username" field, if you have not defined an analyzer for this field it will use "standard" by default, and this way you will be able to obtain results. For keyword fields, you will only be successful in exact match scenarios.

1 Like

Hello,
Your reply seems to be suggesting that analyzers do not work on "keyword" types, which I have confirmed. I tried changing mappings but got HTTP 400 errors, quickly learnt this is not possible to update existing indices' mappings. I deleted my index and created a new one with the payload:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "case_insensitive_analyzer": {
          "type": "custom",
          "filter": [
            "lowercase"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "username": {
        "type": "text",
        "analyzer": "case_insensitive_analyzer"
      }
    }
  }
}

Seeing "tokenizer": "keyword" seems to be what threw me off course - however, this simply implies that a single token will be created for the entire value, instead of as individual words. Simply, this means "text" field will be treated similar to a "keyword" field that I was originally trying to use :slight_smile:

Once I noticed "type": "keyword" in my payload causing errors, changing it to "text" has now solved it. New indices with the payload can now be searched correctly, case-insensitive. Thanks for your assistance.

Matthias

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.