Requesting help with Case-insensitive Analyzer

mvkfg · February 27, 2024, 8:38pm

Hello,
I have enabled a "lowercase" analyzer across all my indices, but I have run into an error while using it.
My query parameter:

"query": {
    "query_string": {
      "query": "username.keyword:\"Test\"",
      "analyzer": "case_insensitive_analyzer"
    }

My expected result: This should return all documents matching "Test" regardless of case permutations. e.g "test" and "TEST"
Actual result: It returns all documents matching only "test", in lowercase.

Some troubleshooting suggests to me, that the analyzer works correctly behind-the-scenes - that is, it's correctly searching in lowercase, regardless of what case I type it in.
However, since I indexed my data prior to adding this analyzer, I assume Elasticsearch does not recognize existing results with the value "TEST" as the same as the lower-cased "test", and they perhaps need to be re-indexed?

The analysis using GET _all/_settings returns this for all indices:

        "analysis": {
          "analyzer": {
            "case_insensitive_analyzer": {
              "filter": [
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "standard"
            },
            "default": {
              "filter": [
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "keyword"
            }
          }
        },

Please inform me if I've done something incorrect here - I am new to analyzers.
Otherwise, if correct, is there a way to "fix" my existing indices/documents to match this case-insensitive mapping?
I am importing large log files programmatically using an import script, so ideally I don't want to have to manually re-index everything.
I am running Elasticsearch 8.11.1

Edit: This analyzer also doesn't work at all when the query contains a space (it returns no results) nor punctuation (it ignores the punctuation entirely). I haven't the foggiest idea why that would be the case.

Thank you
Matthias

RabBit_BR · February 28, 2024, 1:02am

Hi @mvkfg .

I see that you use the keyword field, you have already tried using it only in the "username" field, if you have not defined an analyzer for this field it will use "standard" by default, and this way you will be able to obtain results. For keyword fields, you will only be successful in exact match scenarios.

mvkfg · February 28, 2024, 3:06am

Hello,
Your reply seems to be suggesting that analyzers do not work on "keyword" types, which I have confirmed. I tried changing mappings but got HTTP 400 errors, quickly learnt this is not possible to update existing indices' mappings. I deleted my index and created a new one with the payload:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "case_insensitive_analyzer": {
          "type": "custom",
          "filter": [
            "lowercase"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "username": {
        "type": "text",
        "analyzer": "case_insensitive_analyzer"
      }
    }
  }
}

Seeing "tokenizer": "keyword" seems to be what threw me off course - however, this simply implies that a single token will be created for the entire value, instead of as individual words. Simply, this means "text" field will be treated similar to a "keyword" field that I was originally trying to use

Once I noticed "type": "keyword" in my payload causing errors, changing it to "text" has now solved it. New indices with the payload can now be searched correctly, case-insensitive. Thanks for your assistance.

Matthias

system · March 27, 2024, 3:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Case-insensitive term query Elasticsearch	3	2950	January 20, 2017
Elasticsearch case insensitive - analyzer Elasticsearch	1	794	July 6, 2017
Case insensitive search by using query in java API Elasticsearch	7	5489	July 5, 2017
Case Insensitive Term Filters Elasticsearch	2	1617	July 6, 2017
Mapping case-insensitive, prefix enabled analyzer Elasticsearch	1	525	July 6, 2017

Requesting help with Case-insensitive Analyzer

Related topics