Searching not analyzed term name and value in case insensitive manner

nrmohta · May 20, 2016, 7:51am

Searching term name and value in case insensitive manner
Present structure -
Field Mapping:
"NAME": {
"type": "string",
"index": "not_analyzed",
"store": true
}

Say Sample Data in Field : Neeraj

Problem : Search is now case sensitive, we need to have case insensitive behaviour
Search
NAME:Neeraj - we are getting result
name:neeraj - no result
NAME:NEERAJ - no result

We want behaviour such that all above queries should return result.

Please suggest options to do it. One option is to use standard analyzer with lowercase filter, but in that option I am not sure how to migrate old data.

danielmitterdorfer · May 24, 2016, 7:05am

Hi @nrmohta,

you can use the keyword tokenizer:

PUT /my_index
{
   "settings": {
      "analysis": {
         "analyzer": {
            "case_insensitive": {
               "tokenizer": "keyword",
               "filter": [
                  "lowercase"
               ]
            }
         }
      }
   },
   "mappings": {
      "my_type": {
         "properties": {
            "name": {
               "type": "string",
               "analyzer": "case_insensitive"
            }
         }
      }
   }
}

PUT /my_index/my_type/1
{
    "name": "Neeraj"
}

GET /my_index/_search?q=name:Neeraj
GET /my_index/_search?q=name:neeraj

For further details see Elasticsearch: The Definitive Guide.

To your question for the migration path: When you change the analyzer, you have to reindex your old data.

Daniel

nrmohta · May 27, 2016, 12:16pm

Thanks a lot Daniel for you suggestions. This answered query related to field value. However we also have requirement in which field name can also be search case insensitively. Please guide us.

example:
GET /my_index/_search?q=name:Neeraj
GET /my_index/_search?q=NAME:Neeraj.
Requirement - Both above queries should return the results. difference is in field name 'name' and 'NAME'

anjith.p · June 15, 2016, 8:19am

Is it posssible to achieve case insensitive searching using scripting? I looked at the docs but they were not clear.

danielmitterdorfer · June 15, 2016, 1:25pm

Hi @nrmohta

for some reason I missed that you asked again.

/cc: @anjith.p

Did you try this? This example works for me on Elasticsearch 2.3(i.e. I get search results):

PUT /company/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
}

GET /company/employee/_search?q:FiRsT_NaME=John&explain=true

Daniel

anjith.p · June 15, 2016, 2:00pm

Daniel , your test is workking because your strings are analyzed. But, in our caes we have non-analyzed strings and converting them to analyzed strings is not an option.

danielmitterdorfer · June 15, 2016, 2:22pm

Hi @anjith.p,

sorry for the confusion. In my example I was just demonstrating that the field name casing does not matter. The other issue has been resolved hopefully by my original answer, hence I did not include it in this example again.

Daniel

anjith.p · June 15, 2016, 2:27pm

Thanks Daniel. Your original answer suggests that we should indeed use analyzeer. But, in our case, all our strings are not analyzed and we can't lose the original case. So, I'm wondering if there is anyway of achieving the case insensitive functionality for non-analyzed strings. May be through scripting? Thanks for your help.

danielmitterdorfer · June 15, 2016, 2:33pm

Hi @anjith.p,

you can use multi-fields and apply the analyzer just there. The reference documentation contains an example for multi-fields but note that it's the other way around in the example (i.e. in your case you want the "main" field not_analyzed but the subfield analyzed).

Does that work for you?

Daniel

anjith.p · June 15, 2016, 3:03pm

What I understand from your answer is that I need to save my field as both analyzed and non-analyzed and analyzed will be used for case insensitve searching. Is my assumtion correct? If yes, won't that increase the storage by double. In our case, most of our fields are not-analyzed and we need to support case insensitive searching for them. Is using scripts(though I'm not clear about scripting at the moment) an option?

danielmitterdorfer · June 16, 2016, 6:19am

Hi @anjith.p,

that's correct, you need to separate fields but this does not necessarily mean that your index size doubles (although I guess it will be close to that if you have a lot of not_analyzed fields). I suggest that you run a couple of experiments to determine the impact on index size (I also suggest you run a force_merge at the end of your test to keep sizes comparable).

Daniel

anjith.p · June 16, 2016, 6:42am

Thanks Daniel. We will go with your suggestions. But I even wanted to know if this functionality could be done through ES scripting. Any Ideas? Appreiciate your help.

danielmitterdorfer · June 16, 2016, 11:48am

Hi @anjith.p,

a scripting solution is possible but this will be painfully slow compared to a native search.

Here is a working example (tested with Elasticsearch 2.3.3):

Enable file based scripting in config/elasticsearch.yml:

script.file: true

Add a file "match.groovy" in config/scripts/:

doc['name'].value.toLowerCase() == 'bob'

Note: You can use parameters in scripts (see docs) but this is a minimalistic example so I've hardcoded the value but I hope you get the idea.

Create an index:

PUT cases
{
   "mappings": {
      "people": {
         "properties": {
            "name": {
               "type": "string",
               "index": "not_analyzed"
            }
         }
      }
   }
}

Now add Bob:

POST /cases/people/1
{
    "name": "Bob"
}

And search via a script query:

GET /cases/people/_search
{
   "query": {
      "bool": {
         "must": [
            {
               "match_all": {}
            }
         ],
         "filter": {
            "script": {
               "script": {
                   "file": "match"
               }
            }
         }
      }
   }
}

You get:

{
   "took": 202,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "cases",
            "_type": "people",
            "_id": "1",
            "_score": 1,
            "_source": {
               "name": "Bob"
            }
         }
      ]
   }
}

Now the same with a term query:

GET /cases/people/_search
{
    "query": {
        "term": {
           "name": {
              "value": "Bob"
           }
        }
    }
}

{
   "took": 9,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.30685282,
      "hits": [
         {
            "_index": "cases",
            "_type": "people",
            "_id": "1",
            "_score": 0.30685282,
            "_source": {
               "name": "Bob"
            }
         }
      ]
   }
}

I had just one document stored and you already see the performance difference: 9 ms vs 202 ms. Please keep this in mind and test the performance early because it will be much slower with this approach.

Daniel

anjith.p · June 16, 2016, 11:51am

Thanks Daniel. Yes, I have just figured out that scripts actually work Fortuantely, slow performance is OK in our use case. Really appreciate your response. I'll indeed write one article about ES scripting:It is really powerful

danielmitterdorfer · June 16, 2016, 11:57am

Hi @anjith.p,

glad that I could help! Can you share the link to the article here once you've published it?

Daniel

anjith.p · June 17, 2016, 4:43am

I'll for sure. We have actually made the script "native" by using java plug-in and found out that it's performance is as good as regular search(without script) performance.

danielmitterdorfer · June 17, 2016, 5:52am

Good to hear that the performance is sufficient (although I'm a bit surprised that it's really comparable to a non-script solution).

anjith.p · June 17, 2016, 10:01am

I suppose the query syntax you used is incorrect here. Correct query looks like the below:

GET /company/employee/_search?q=FiRsT_NaME:John&explain=true

And the above query doesn't return any results.

danielmitterdorfer · June 17, 2016, 11:23am

Not sure what I copy & pasted here because I try out all examples in Sense before posting. But your syntax is correct and the field name in the search must match exactly the field name in the mapping (that's the reason why you get no results).

nrmohta · June 20, 2016, 12:32pm

To enable case insensitive search on field name -

option we are thinking is to maintain the cache of all fields in memory and comparing search input field name against this cache list case insensitively.
Fields which matches case insensitively, to be added in 'fields' query so that search will get perform on all those fields.

Another option could be lower casing all fields while indexing, but as per our requirement we do want to support case sensitivity on field names.

In case you have pointers to achieve our goal in a different way, please suggest.

Topic		Replies	Views
Case insensitive search on index:"not_analyzed" Elasticsearch	2	1288	July 5, 2017
Case-insensitive term query Elasticsearch	3	2950	January 20, 2017
Mapping case-insensitive, prefix enabled analyzer Elasticsearch	1	525	July 6, 2017
Mapping case-insensitive, prefix enabled analyzer Elasticsearch	1	326	July 6, 2017
Case insensitive search on not analyzed fields Elasticsearch	3	2115	July 5, 2017

Searching not analyzed term name and value in case insensitive manner

Related topics