Non-case sensitive search on termquery


(Sergey Panov) #1

Hi There,

I have already designed an application which is using term queries on keywords and I found out search is case sensitive. The problem is that my keyword fields represent the data as well. That means, I want to search them with non-case sensitive search, from the other hand I do not want to explicitly set them as lowercase as I do not want to loose case information to display data properly.

Is the above possible to achieve with ES 5.1? If not, what is better from performance perspective: duplicate my keywords: one in lowercase for search and another in original case to display? Or switch to "match" query? Actually match query for searching, for instance "777" will return also "111777111" (in filter context), but I want to avoid that.

Thanks in advance for your help

Regards,
Sergey


(Makoto Nozawa) #2

Hi Sergey,

I think that in case using lowercase filter, _source field still has original case information.
So you can display the word in original format.

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html

My testing case in Elasticsearc 5.1.1 below.
In 5.2.0, "normalizer" option has added in keyword datatype, so it will be more simple.

// create index include "tags" field with lowercase filter.

PUT test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "sample_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "char_filter": [],
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "tags": {
          "type": "text"
          , "analyzer": "sample_analyzer"}
      }
    }
  }
}

// index data including uppercase charactor in "tags" field

PUT test_index/doc/1
{
  "tags": "Sample"
}

// search by lowercased term

GET test_index/_search
{
  "query": {
    "term": {
      "tags": {
        "value": "sample"
      }
    }
  }
}

// result ("tags" field in hits still has case information)

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "tags": "Sample"
        }
      }
    ]
  }
}

mnozawa


(Sergey Panov) #3

Thank you for reply, I have updated my ES to 5.2 and used normalizer, which is really easy to configure. Still I do not see any solution to make case-sensitive and non case-sensitive search on the same field (if you setup a normalizer with lowercase filter, you can not search this field without normalizer anymore), but I currently do not need to implement this scenario, probably in later versions we will see normalizers being dynamically added inside the search query


(Makoto Nozawa) #4

FYI

In case using one field for two way (like case-sensitive and case insensitive), fields is useful :+1:
https://www.elastic.co/guide/en/elasticsearch/reference/master/multi-fields.html


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.