Searching not analyzed term name and value in case insensitive manner

Searching term name and value in case insensitive manner
Present structure -
Field Mapping:
"NAME": {
"type": "string",
"index": "not_analyzed",
"store": true
}

Say Sample Data in Field : Neeraj

Problem : Search is now case sensitive, we need to have case insensitive behaviour
Search
NAME:Neeraj - we are getting result
name:neeraj - no result
NAME:NEERAJ - no result

We want behaviour such that all above queries should return result.

Please suggest options to do it. One option is to use standard analyzer with lowercase filter, but in that option I am not sure how to migrate old data.

Hi @nrmohta,

you can use the keyword tokenizer:

PUT /my_index
{
   "settings": {
      "analysis": {
         "analyzer": {
            "case_insensitive": {
               "tokenizer": "keyword",
               "filter": [
                  "lowercase"
               ]
            }
         }
      }
   },
   "mappings": {
      "my_type": {
         "properties": {
            "name": {
               "type": "string",
               "analyzer": "case_insensitive"
            }
         }
      }
   }
}
PUT /my_index/my_type/1
{
    "name": "Neeraj"
}
GET /my_index/_search?q=name:Neeraj
GET /my_index/_search?q=name:neeraj 

For further details see Elasticsearch: The Definitive Guide.

To your question for the migration path: When you change the analyzer, you have to reindex your old data.

Daniel

Thanks a lot Daniel for you suggestions. This answered query related to field value. However we also have requirement in which field name can also be search case insensitively. Please guide us.

example:
GET /my_index/_search?q=name:Neeraj
GET /my_index/_search?q=NAME:Neeraj.
Requirement - Both above queries should return the results. difference is in field name 'name' and 'NAME'

Is it posssible to achieve case insensitive searching using scripting? I looked at the docs but they were not clear.

Hi @nrmohta

for some reason I missed that you asked again.

/cc: @anjith.p

Did you try this? This example works for me on Elasticsearch 2.3(i.e. I get search results):

PUT /company/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
}

GET /company/employee/_search?q:FiRsT_NaME=John&explain=true

Daniel

Daniel , your test is workking because your strings are analyzed. But, in our caes we have non-analyzed strings and converting them to analyzed strings is not an option.

Hi @anjith.p,

sorry for the confusion. In my example I was just demonstrating that the field name casing does not matter. The other issue has been resolved hopefully by my original answer, hence I did not include it in this example again.

Daniel

Thanks Daniel. Your original answer suggests that we should indeed use analyzeer. But, in our case, all our strings are not analyzed and we can't lose the original case. So, I'm wondering if there is anyway of achieving the case insensitive functionality for non-analyzed strings. May be through scripting? Thanks for your help.

Hi @anjith.p,

you can use multi-fields and apply the analyzer just there. The reference documentation contains an example for multi-fields but note that it's the other way around in the example (i.e. in your case you want the "main" field not_analyzed but the subfield analyzed).

Does that work for you?

Daniel

What I understand from your answer is that I need to save my field as both analyzed and non-analyzed and analyzed will be used for case insensitve searching. Is my assumtion correct? If yes, won't that increase the storage by double. In our case, most of our fields are not-analyzed and we need to support case insensitive searching for them. Is using scripts(though I'm not clear about scripting at the moment) an option?

Hi @anjith.p,

that's correct, you need to separate fields but this does not necessarily mean that your index size doubles (although I guess it will be close to that if you have a lot of not_analyzed fields). I suggest that you run a couple of experiments to determine the impact on index size (I also suggest you run a force_merge at the end of your test to keep sizes comparable).

Daniel

Thanks Daniel. We will go with your suggestions. But I even wanted to know if this functionality could be done through ES scripting. Any Ideas? Appreiciate your help.

Hi @anjith.p,

a scripting solution is possible but this will be painfully slow compared to a native search.

Here is a working example (tested with Elasticsearch 2.3.3):

Enable file based scripting in config/elasticsearch.yml:

script.file: true

Add a file "match.groovy" in config/scripts/:

doc['name'].value.toLowerCase() == 'bob'

Note: You can use parameters in scripts (see docs) but this is a minimalistic example so I've hardcoded the value but I hope you get the idea.

Create an index:

PUT cases
{
   "mappings": {
      "people": {
         "properties": {
            "name": {
               "type": "string",
               "index": "not_analyzed"
            }
         }
      }
   }
}

Now add Bob:

POST /cases/people/1
{
    "name": "Bob"
}

And search via a script query:

GET /cases/people/_search
{
   "query": {
      "bool": {
         "must": [
            {
               "match_all": {}
            }
         ],
         "filter": {
            "script": {
               "script": {
                   "file": "match"
               }
            }
         }
      }
   }
}

You get:

{
   "took": 202,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "cases",
            "_type": "people",
            "_id": "1",
            "_score": 1,
            "_source": {
               "name": "Bob"
            }
         }
      ]
   }
}

Now the same with a term query:

GET /cases/people/_search
{
    "query": {
        "term": {
           "name": {
              "value": "Bob"
           }
        }
    }
}
{
   "took": 9,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.30685282,
      "hits": [
         {
            "_index": "cases",
            "_type": "people",
            "_id": "1",
            "_score": 0.30685282,
            "_source": {
               "name": "Bob"
            }
         }
      ]
   }
}

I had just one document stored and you already see the performance difference: 9 ms vs 202 ms. Please keep this in mind and test the performance early because it will be much slower with this approach.

Daniel

Thanks Daniel. Yes, I have just figured out that scripts actually work :slight_smile: Fortuantely, slow performance is OK in our use case. Really appreciate your response. I'll indeed write one article about ES scripting:It is really powerful :slight_smile:

Hi @anjith.p,

glad that I could help! :slight_smile: Can you share the link to the article here once you've published it?

Daniel

I'll for sure. We have actually made the script "native" by using java plug-in and found out that it's performance is as good as regular search(without script) performance.

Good to hear that the performance is sufficient (although I'm a bit surprised that it's really comparable to a non-script solution).

I suppose the query syntax you used is incorrect here. Correct query looks like the below:

GET /company/employee/_search?q=FiRsT_NaME:John&explain=true

And the above query doesn't return any results.

Not sure what I copy & pasted here because I try out all examples in Sense before posting. But your syntax is correct and the field name in the search must match exactly the field name in the mapping (that's the reason why you get no results).

To enable case insensitive search on field name -

option we are thinking is to maintain the cache of all fields in memory and comparing search input field name against this cache list case insensitively.
Fields which matches case insensitively, to be added in 'fields' query so that search will get perform on all those fields.

Another option could be lower casing all fields while indexing, but as per our requirement we do want to support case sensitivity on field names.

In case you have pointers to achieve our goal in a different way, please suggest.