Alternative option of shingle token filter

ankurpatel · October 4, 2021, 7:18am

I would like to implement auto suggest functionality just like google search. Shingle Token Filter is best suitable option for my requirement. Let me explain my index details.

Index size is 16gb - 20 million documents. Each document average size is 0.53 Kb.
Shingle filter is applied on WorkDescripton field. Average size of WorkDescripton field is 0.45kb.
Below Analyzer, filter and mapping used at Index creation time
a. Filter

 "shingle-filter" : {
                       "max_shingle_size" : "3",
                       "min_shingle_size" : "2",
                       "output_unigrams" : "false",
                       "type" : "shingle"
                  }

      b. Analyzer-

                    "ana_autocomplete" : {
                       "filter" : [
                                "lowercase",
                               "shingle-filter"
                        ],
                       "tokenizer" : "standard"
                  }

 c. Mapping with work description field is

 "workDesc" : 
                 {
                     "type" : "text",
                     "fields" : 
                     {
                          "suggestions" : 
                          {
                               "type" : "text",
                               "analyzer" : "ana_autocomplete",
                               "fielddata" : true,
                               "fielddata_frequency_filter" : 
                                {
                                       "min" : 0.001,
                                       "max" : 0.1,
                                      "min_segment_size" : 500
                                }
                           },
                           "workDesc" : 
                          {
                                 "type" : "text",
                                 "analyzer" : "ana_tenderinfo"
                          }
                    }
             }

I used aggregate query to get the result

GET tenderinfo_version_9/_search
      {
            "aggs":
           {
                "workDesc_111":
                {
                     "terms":
                     {
                          "field":"workDesc.suggestions",
                           "include":"civil.*",
                           "order":[
                           {
                                "_count":"desc"
                           }],
                           "size":10
                     }
                }
           }
           ,"size":0,
          "_source":false
    }

Auto complete functionality work like charm (result coming in millisecond) when documents are 0.1 million, But I got time out error when documents are 20 million.

Elasticsearch server configuration is

-Operation System - Ubantu 18.04.6 LTS

 -CPU - Intel(R) Core(TM) i5-7400 CPU @ 3.00 GHz. 1 physical processor, 4 core, 4   threads

  -RAM - 16gb

Is there any other options to get the same result? Please advise

system · November 1, 2021, 7:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance issues with top_hits aggregation using shingle filter Elasticsearch	1	1059	July 6, 2017
Problem with shingles as an autocomplete solution Elasticsearch	5	2543	July 6, 2017
Elasticsearch Suggestions using shingle Elasticsearch	1	513	February 8, 2020
Fuzzy searching on shingles filter getting problem Elasticsearch	1	634	November 6, 2018
Fuzzy searching on shingles filter getting problem for search Elasticsearch	1	408	November 9, 2018

Alternative option of shingle token filter

Related topics