Case Sensitive Query for Text fields in Elasticsearch

pathfinder225 · February 11, 2021, 12:40pm

I have a requirement of case sensitive search where I need to search for particular word(case sensitive ) in message field which is of type text. I am using Java Rest High level client.

My search word is ERROR , query matches lower case words as well i.e error, Error etc etc where i want documents matching exactly ERROR,. I understand this to do something with analyzers but couldnt quite understand how to go about it . your help is much apppreciated.

Below is the query and mappings

	    "message": {
    					"type": "text",
    					"fields": {
    						"keyword": {
    							"type": "keyword",
    							"ignore_above": 256
    						}
    					}
    				}

    GET filebeat-7.9.1-2021.02.11*/_search
    {
    	"query": {
    		"bool": {
    			"filter": [{
    				"range": {
    					"@timestamp": {
    						"gte": "now-50m"
    					}
    				}
    			}, {
    				"bool": {
    					"must": [{
    						"match": {
    							"host.name": "ocp1110231"
    						}
    					}, {
    						"match": {
    							"input.type": "log"
    						}
    					}, {
    						"match": {
    							"log.file.path": "/home/app/platform.log"
    						}
    					}, {
    						"query_string": {
    							    "default_field": "message",
    								"query": "ERROR",
    								"default_operator":"AND"

    						}
    					}]
    				}
    			}]

    		}
    	},
    }

pathfinder225 · February 17, 2021, 2:56am

Any help is much appreciated please.

Mark_Harwood · February 17, 2021, 11:27am

Here's an example based on using a custom case-sensitive synonym.

DELETE test

PUT /test
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "standard",
            "filter": [ "synonym", "lowercase" ]
          }
        },
        "filter": {
          "synonym": {
            "type": "synonym",
            "lenient": true,
            "ignore_case":false,
            "synonyms": [ "ERROR => syn_error" ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text":{
        "type": "text",
        "analyzer": "synonym"
      }
    }
  }
}
POST test/_analyze
{
  "field": "text",
  "text": ["ERROR"]
}
POST test/_analyze
{
  "field": "text",
  "text": ["syn_error"]
}

Note we pick a replacement token syn_error that we don't expect to see in the original text.
Let's add example docs:

PUT test/_doc/1
{
  "text":"this is not a real error"
}
PUT test/_doc/2
{
  "text":"ERROR this is real"
}

Now this search will only match all-uppercase ERROR

GET test/_search
{
  "query": {
    "match": {
      "text": "ERROR"
    }
  }
}

While this query will match documents with any case variation of error apart from all-uppercase

GET test/_search
{
  "query": {
    "match": {
      "text": "Error"
    }
  }
}

Case insensitive matches on all other text work OK.

GET test/_search
{
  "query": {
    "match": {
      "text": "REAL"
    }
  }
}

pathfinder225 · February 19, 2021, 8:05am

@Mark_Harwood Thank you so much for the response.

So its all about creating mapping.

We are using filebeat to crawl some log files and sends to logstash , which will create new index daily, with date appended to index name.

I'm not using any explicit mapping it's all default(default filebeat template). Now how do i deal with this scenario.

So for every new index how can i update the mapping with analyzer.

Mark_Harwood · February 19, 2021, 9:42am

See index templates

pathfinder225 · February 21, 2021, 6:09pm

@Mark_Harwood

Thanks a ton Solved my issue.
I created a custom index template and changed filebeat.yml to use the template and it works great.

Will test this more and explore best practices. If you can suggest some that will be great.

Mark_Harwood · February 21, 2021, 9:01pm

Glad to know you got it working.
One suggestion is to try extract structured keyword fields from the text using regex patterns - either in custom code, Logstash configurations or ingest pipelines. They allow you to do things like aggregations on your data. The new runtime fields allow you to define similar expressions that get evaluated at query time to do queries or aggregations but will not be as fast as an index with the fields pre-extracted

system · March 21, 2021, 9:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Case-sensitive search and custom analyzer Elasticsearch	3	830	October 18, 2019
How do I do Case Sensitive Match Query in Elastic Search? Elasticsearch	6	743	August 9, 2022
Case insensitive search by using query in java API Elasticsearch	7	5489	July 5, 2017
Case-Insensitive regex-based search for text fields in ES 5.6.3 Elasticsearch	1	418	June 9, 2019
Case-sensitive search Elasticsearch	5	3837	July 5, 2017

Case Sensitive Query for Text fields in Elasticsearch

Related topics