Performance issue by using edge ngram

I am implementing autocomplete for agency name , civil service title and fiscal year fields.I have no issue with civil_service_title .But i see very slow performance with agency and fiscal year.
I also plan to have many fields with agency has i use it for different purposes.(autocomplete, keyword and text).
I want to know if my mapping needs to be changed. How can i make sure my autocomplete functionality works for special characters like #,~,&.It doesnot give me results when i type any name that includes special characters example:(Department of education #10)

PUT indexName
{
  
   "settings":{
      "analysis":{
         "filter”:{
        "gramFilter": {
          
             "type":     "edge_ngram",
            "min_gram" : 1,
            "max_gram" : 30,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      
      },
         "analyzer":{
            "index_analyzer":{ 
               "type":"custom",
               "tokenizer":"keyword",
               "filter":[
                  "lowercase",
                  "trim"
               ]
             },
            "search_string_analyzer":{ 
               "type":"custom",
               "tokenizer":"keyword",
               "filter":[
                  "lowercase"
               ]
             },
          
          "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "trim",
            "gramFilter",
             "asciifolding"
           
            
          ]
        }
            
      
          
      }
      
      
   ,
   "mappings":{
      “Type”:{
         "properties":{
          
           "civil_service_title":{
             "type":"text",
              "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
              },
          
              "analyzer":"autocomplete", 
              "search_analyzer":"standard"
               
           },
               
            "agency_name":{
              "type":"text",
              "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
              },
          
              "analyzer":"autocomplete", 
              "search_analyzer":"standard"
              
              } ,
          "fiscal_year":{
              "type":"text",
              "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
              },
          
              "analyzer":"autocomplete", 
              "search_analyzer":"standard"
          }
          
         
   }
}
      }

Isn't that a bit excessive?

"min_gram" : 1,
"max_gram" : 30,

Use case wise I'd expect to type at least 2 or 3 characters before proposing anything to the end user. Going up to 30 letters then means that after you typed the first 10 letters you still need to "complete"? That sounds too many terms to me.

Do you really want to use a keyword tokenizer BTW to propose the completion?
ie. for Department of education #10 you can only type depar and don't suggest anything for educ?

About

special characters like #,~,&

You defined:

"token_chars": [
  "letter",
  "digit"
]

Take a look at the list of the options there: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html

1 Like

My usecase is something like this , i have terms department of education , department of sanitation , department of social services and so on , If i wish to not just stop by typing 3 characters , i will have to use maxgram till 10??Earlier in solr we have implemented this by using regular expression n tokenizer was a keyword.
Also i should be able to type education to find the result department of education .Can you please let me know the best analyzer that suits this scenario
Also , my question is by adding multiple fields with analyzers would reduce the speed?
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html

Give it a try. And see how it fits your use case.

The _analyze API will help a lot to see how your text will be indexed.

@dadoonet thank you.

@dadoonet I am planning to use regexp query and use index_analyzer for my usecase.Please suggest me if this is optimal

Regexp, wildcards, prefix queries are not optimal IMHO.

If you are looking for the optimal autocomplete solution, look at this API: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

Otherwise edge ngram seems to me a good thing to do.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.