Case Sensitive Query for Text fields in Elasticsearch

I have a requirement of case sensitive search where I need to search for particular word(case sensitive ) in message field which is of type text. I am using Java Rest High level client.

My search word is ERROR , query matches lower case words as well i.e error, Error etc etc where i want documents matching exactly ERROR,. I understand this to do something with analyzers but couldnt quite understand how to go about it . your help is much apppreciated.

Below is the query and mappings

	    "message": {
    					"type": "text",
    					"fields": {
    						"keyword": {
    							"type": "keyword",
    							"ignore_above": 256
    						}
    					}
    				}
    GET filebeat-7.9.1-2021.02.11*/_search
    {
    	"query": {
    		"bool": {
    			"filter": [{
    				"range": {
    					"@timestamp": {
    						"gte": "now-50m"
    					}
    				}
    			}, {
    				"bool": {
    					"must": [{
    						"match": {
    							"host.name": "ocp1110231"
    						}
    					}, {
    						"match": {
    							"input.type": "log"
    						}
    					}, {
    						"match": {
    							"log.file.path": "/home/app/platform.log"
    						}
    					}, {
    						"query_string": {
    							    "default_field": "message",
    								"query": "ERROR",
    								"default_operator":"AND"

    						}
    					}]
    				}
    			}]

    		}
    	},
    } 

Any help is much appreciated please.

Here's an example based on using a custom case-sensitive synonym.

DELETE test

PUT /test
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym": {
            "tokenizer": "standard",
            "filter": [ "synonym", "lowercase" ]
          }
        },
        "filter": {
          "synonym": {
            "type": "synonym",
            "lenient": true,
            "ignore_case":false,
            "synonyms": [ "ERROR => syn_error" ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text":{
        "type": "text",
        "analyzer": "synonym"
      }
    }
  }
}
POST test/_analyze
{
  "field": "text",
  "text": ["ERROR"]
}
POST test/_analyze
{
  "field": "text",
  "text": ["syn_error"]
}

Note we pick a replacement token syn_error that we don't expect to see in the original text.
Let's add example docs:

PUT test/_doc/1
{
  "text":"this is not a real error"
}
PUT test/_doc/2
{
  "text":"ERROR this is real"
}

Now this search will only match all-uppercase ERROR

GET test/_search
{
  "query": {
    "match": {
      "text": "ERROR"
    }
  }
}

While this query will match documents with any case variation of error apart from all-uppercase

GET test/_search
{
  "query": {
    "match": {
      "text": "Error"
    }
  }
}

Case insensitive matches on all other text work OK.

GET test/_search
{
  "query": {
    "match": {
      "text": "REAL"
    }
  }
}

@Mark_Harwood Thank you so much for the response.

So its all about creating mapping.

We are using filebeat to crawl some log files and sends to logstash , which will create new index daily, with date appended to index name.

I'm not using any explicit mapping it's all default(default filebeat template). Now how do i deal with this scenario.

So for every new index how can i update the mapping with analyzer.

See index templates

1 Like

@Mark_Harwood

Thanks a ton :slight_smile: Solved my issue.
I created a custom index template and changed filebeat.yml to use the template and it works great.

Will test this more and explore best practices. If you can suggest some that will be great.

Glad to know you got it working.
One suggestion is to try extract structured keyword fields from the text using regex patterns - either in custom code, Logstash configurations or ingest pipelines. They allow you to do things like aggregations on your data. The new runtime fields allow you to define similar expressions that get evaluated at query time to do queries or aggregations but will not be as fast as an index with the fields pre-extracted

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.