I would like to implement auto suggest functionality just like google search. Shingle Token Filter is best suitable option for my requirement. Let me explain my index details.
- Index size is 16gb - 20 million documents. Each document average size is 0.53 Kb.
- Shingle filter is applied on WorkDescripton field. Average size of WorkDescripton field is 0.45kb.
- Below Analyzer, filter and mapping used at Index creation time
a. Filter
"shingle-filter" : {
"max_shingle_size" : "3",
"min_shingle_size" : "2",
"output_unigrams" : "false",
"type" : "shingle"
}
b. Analyzer-
"ana_autocomplete" : {
"filter" : [
"lowercase",
"shingle-filter"
],
"tokenizer" : "standard"
}
c. Mapping with work description field is
"workDesc" :
{
"type" : "text",
"fields" :
{
"suggestions" :
{
"type" : "text",
"analyzer" : "ana_autocomplete",
"fielddata" : true,
"fielddata_frequency_filter" :
{
"min" : 0.001,
"max" : 0.1,
"min_segment_size" : 500
}
},
"workDesc" :
{
"type" : "text",
"analyzer" : "ana_tenderinfo"
}
}
}
- I used aggregate query to get the result
GET tenderinfo_version_9/_search
{
"aggs":
{
"workDesc_111":
{
"terms":
{
"field":"workDesc.suggestions",
"include":"civil.*",
"order":[
{
"_count":"desc"
}],
"size":10
}
}
}
,"size":0,
"_source":false
}
-
Auto complete functionality work like charm (result coming in millisecond) when documents are 0.1 million, But I got time out error when documents are 20 million.
-
Elasticsearch server configuration is
-Operation System - Ubantu 18.04.6 LTS -CPU - Intel(R) Core(TM) i5-7400 CPU @ 3.00 GHz. 1 physical processor, 4 core, 4 threads -RAM - 16gb
Is there any other options to get the same result? Please advise