Search for special characters

RohanKumbhar · January 8, 2018, 10:34am

What is the best combination of token filters+tokenizer+analyzer to search on special characters present in my document?

val · January 8, 2018, 1:09pm

It would be better if you provide some sample data and explain how you want to search it.

RohanKumbhar · January 8, 2018, 1:17pm

Hi @val

I have field in my document which normally contains special characters in it for example
my_field = "$file123#.txt"
my_field = "$office@location&home.txt"

now the requirement is if user search for special chars like $ , #, @, & etc. so he should get required my_field in the result

I have tried it using ngram tokenizer but it is giving some irrelevant search result if i search for normal text ,

so now requirement here is if user search for special characters or numbers or text he should get relevant search result.

Kindly help to solve this

val · January 8, 2018, 1:26pm

Can you show what you tried with ngrams?

RohanKumbhar · January 8, 2018, 1:44pm

I have used below custom analyzer

PUT my_index
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "custom_analyzer": {
            "type": "custom",
            "tokenizer": "my_tokenizer"
          }
        },
        "tokenizer": {
          "my_tokenizer": {
            "type": "ngram",
            "min_gram": 1,
            "max_gram": 10
          }
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "my_field": {
          "type": "text",
          "analyzer": "custom_analyzer",
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

PUT my_index/doc/2 
{
  "my_field":"$title123.txt"
}

PUT my_index/doc/1
{
  "my_field":"$titan@123#.txt"
}

Here special character search is working fine but if i search for titan i'm getting both the documents (my_field with title in it) which is irrelevant search

val · January 8, 2018, 1:49pm

Great start! Try not to use your custom analyzer at search time, otherwise the search terms will get analyzed as well, i.e. titan will get tokenized to t, ti, tit, tita and titan and of course the first three tokens will match title too.

Add "search_analyzer" => "standard" to your field and it should already work better

    "my_field": {
      "type": "text",
      "analyzer": "custom_analyzer",
      "search_analyzer": "standard",           <--- add this
      "fields": {
        "keyword": {
          "ignore_above": 256,
          "type": "keyword"
        }
      }
    }

Note that you need to re-create and re-index your data first.

val · January 8, 2018, 1:54pm

Also note that standard analyzer I'm suggesting might not be a good fit for searching special characters. You might want to create another custom analyzer but without the ngram tokenizer. The key point is not to use ngrams at search time.

RohanKumbhar · January 8, 2018, 1:55pm

Sure , Thanks @val !!

system · February 5, 2018, 2:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Searching special characters in elastic Elasticsearch	4	206	April 10, 2024
Special characters search in elastic search Elasticsearch	6	483	July 6, 2017
Search using special characters in standard analyzer Elasticsearch	1	308	May 11, 2023
Search string with special characters Elasticsearch	4	3573	August 22, 2018
Searching special characters !, $, #, @ Elasticsearch	3	1062	February 7, 2023

Search for special characters

Related topics