Searching not analyzed term name and value in case insensitive manner

Hi @anjith.p,

sorry for the confusion. In my example I was just demonstrating that the field name casing does not matter. The other issue has been resolved hopefully by my original answer, hence I did not include it in this example again.

Daniel

Thanks Daniel. Your original answer suggests that we should indeed use analyzeer. But, in our case, all our strings are not analyzed and we can't lose the original case. So, I'm wondering if there is anyway of achieving the case insensitive functionality for non-analyzed strings. May be through scripting? Thanks for your help.

Hi @anjith.p,

you can use multi-fields and apply the analyzer just there. The reference documentation contains an example for multi-fields but note that it's the other way around in the example (i.e. in your case you want the "main" field not_analyzed but the subfield analyzed).

Does that work for you?

Daniel

What I understand from your answer is that I need to save my field as both analyzed and non-analyzed and analyzed will be used for case insensitve searching. Is my assumtion correct? If yes, won't that increase the storage by double. In our case, most of our fields are not-analyzed and we need to support case insensitive searching for them. Is using scripts(though I'm not clear about scripting at the moment) an option?

Hi @anjith.p,

that's correct, you need to separate fields but this does not necessarily mean that your index size doubles (although I guess it will be close to that if you have a lot of not_analyzed fields). I suggest that you run a couple of experiments to determine the impact on index size (I also suggest you run a force_merge at the end of your test to keep sizes comparable).

Daniel

Thanks Daniel. We will go with your suggestions. But I even wanted to know if this functionality could be done through ES scripting. Any Ideas? Appreiciate your help.

Hi @anjith.p,

a scripting solution is possible but this will be painfully slow compared to a native search.

Here is a working example (tested with Elasticsearch 2.3.3):

Enable file based scripting in config/elasticsearch.yml:

script.file: true

Add a file "match.groovy" in config/scripts/:

doc['name'].value.toLowerCase() == 'bob'

Note: You can use parameters in scripts (see docs) but this is a minimalistic example so I've hardcoded the value but I hope you get the idea.

Create an index:

PUT cases
{
   "mappings": {
      "people": {
         "properties": {
            "name": {
               "type": "string",
               "index": "not_analyzed"
            }
         }
      }
   }
}

Now add Bob:

POST /cases/people/1
{
    "name": "Bob"
}

And search via a script query:

GET /cases/people/_search
{
   "query": {
      "bool": {
         "must": [
            {
               "match_all": {}
            }
         ],
         "filter": {
            "script": {
               "script": {
                   "file": "match"
               }
            }
         }
      }
   }
}

You get:

{
   "took": 202,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "cases",
            "_type": "people",
            "_id": "1",
            "_score": 1,
            "_source": {
               "name": "Bob"
            }
         }
      ]
   }
}

Now the same with a term query:

GET /cases/people/_search
{
    "query": {
        "term": {
           "name": {
              "value": "Bob"
           }
        }
    }
}
{
   "took": 9,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.30685282,
      "hits": [
         {
            "_index": "cases",
            "_type": "people",
            "_id": "1",
            "_score": 0.30685282,
            "_source": {
               "name": "Bob"
            }
         }
      ]
   }
}

I had just one document stored and you already see the performance difference: 9 ms vs 202 ms. Please keep this in mind and test the performance early because it will be much slower with this approach.

Daniel

Thanks Daniel. Yes, I have just figured out that scripts actually work :slight_smile: Fortuantely, slow performance is OK in our use case. Really appreciate your response. I'll indeed write one article about ES scripting:It is really powerful :slight_smile:

Hi @anjith.p,

glad that I could help! :slight_smile: Can you share the link to the article here once you've published it?

Daniel

I'll for sure. We have actually made the script "native" by using java plug-in and found out that it's performance is as good as regular search(without script) performance.

Good to hear that the performance is sufficient (although I'm a bit surprised that it's really comparable to a non-script solution).

I suppose the query syntax you used is incorrect here. Correct query looks like the below:

GET /company/employee/_search?q=FiRsT_NaME:John&explain=true

And the above query doesn't return any results.

Not sure what I copy & pasted here because I try out all examples in Sense before posting. But your syntax is correct and the field name in the search must match exactly the field name in the mapping (that's the reason why you get no results).

To enable case insensitive search on field name -

option we are thinking is to maintain the cache of all fields in memory and comparing search input field name against this cache list case insensitively.
Fields which matches case insensitively, to be added in 'fields' query so that search will get perform on all those fields.

Another option could be lower casing all fields while indexing, but as per our requirement we do want to support case sensitivity on field names.

In case you have pointers to achieve our goal in a different way, please suggest.

@danielmitterdorfer By applying the keyword tokenizer, are we still guaranteed an exact match in the search result?

Hi @charyorde,

I am not sure what you mean by that. Can you please provide a concrete example?

Daniel

@danielmitterdorfer . Thanks for writing. I want to achieve exact match in search as well as maintain case insensitivity. So I'm wondering whether this makes sense.
{ "settings": { "analysis": { "analyzer": { "case_insensitive": { "tokenizer": "keyword", "filter": [ "lowercase" ] } } } }, "mappings" : { "pages" : { "properties" : { "firstname" : { "type" : "string", "index": "not_analyzed", "analyzer": "case_insensitive" } } } } }

If I remove "index": "not_analyzed", am I still guaranteed an exact match in the search result?

Hi @charyorde,

you are adding an analyzer to a not_analyzed field. This cannot work. I checked the behavior of Elasticsearch 2.4.4 and it happily accepts these settings in the create index request. However, it simply ignores your analyzer and treats the field as not_analyzed. Just try this:

PUT /my_index
{
   "settings": {
      "analysis": {
         "analyzer": {
            "case_insensitive": {
               "tokenizer": "keyword",
               "filter": [
                  "lowercase"
               ]
            }
         }
      }
   },
   "mappings": {
      "my_type": {
         "properties": {
            "name": {
               "type": "string",
               "index": "not_analyzed", 
               "analyzer": "case_insensitive"
            }
         }
      }
   }
}

Then do GET /my_index/_mapping and you'll see that Elasticsearch simply ignores your analyzer:

{
   "my_index": {
      "mappings": {
         "my_type": {
            "properties": {
               "name": {
                  "type": "string",
                  "index": "not_analyzed"
               }
            }
         }
      }
   }
}

I suggest you play around with the example that I've provided in my initial response, go through the thread and decide what fits your use-case best (I have a feeling that the mapping that I've provided in my initial response fits your use-case).

Daniel

Got it. Thanks