Alternative option of shingle token filter

I would like to implement auto suggest functionality just like google search. Shingle Token Filter is best suitable option for my requirement. Let me explain my index details.

  1. Index size is 16gb - 20 million documents. Each document average size is 0.53 Kb.
  2. Shingle filter is applied on WorkDescripton field. Average size of WorkDescripton field is 0.45kb.
  3. Below Analyzer, filter and mapping used at Index creation time
    a. Filter
 "shingle-filter" : {
                       "max_shingle_size" : "3",
                       "min_shingle_size" : "2",
                       "output_unigrams" : "false",
                       "type" : "shingle"
                  }  
      b. Analyzer-
                    "ana_autocomplete" : {
                       "filter" : [
                                "lowercase",
                               "shingle-filter"
                        ],
                       "tokenizer" : "standard"
                  }
 c. Mapping with work description field is
 "workDesc" : 
                 {
                     "type" : "text",
                     "fields" : 
                     {
                          "suggestions" : 
                          {
                               "type" : "text",
                               "analyzer" : "ana_autocomplete",
                               "fielddata" : true,
                               "fielddata_frequency_filter" : 
                                {
                                       "min" : 0.001,
                                       "max" : 0.1,
                                      "min_segment_size" : 500
                                }
                           },
                           "workDesc" : 
                          {
                                 "type" : "text",
                                 "analyzer" : "ana_tenderinfo"
                          }
                    }
             }
  1. I used aggregate query to get the result
GET tenderinfo_version_9/_search
      {
            "aggs":
           {
                "workDesc_111":
                {
                     "terms":
                     {
                          "field":"workDesc.suggestions",
                           "include":"civil.*",
                           "order":[
                           {
                                "_count":"desc"
                           }],
                           "size":10
                     }
                }
           }
           ,"size":0,
          "_source":false
    }
  1. Auto complete functionality work like charm (result coming in millisecond) when documents are 0.1 million, But I got time out error when documents are 20 million.

  2. Elasticsearch server configuration is

    -Operation System - Ubantu 18.04.6 LTS
    
     -CPU - Intel(R) Core(TM) i5-7400 CPU @ 3.00 GHz. 1 physical processor, 4 core, 4   threads
    
      -RAM - 16gb
    

Is there any other options to get the same result? Please advise

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.