Hi,
I have a task to implement search among some records by some different fields.
One of them - "Title" and regarding this specific field there are few demands:
- internal match (e.g. "esti" in search request should match "Testing" value);
- not strict match (e.g. "Tosting" in search request should match "Testing" value);
Another demand regarding search overall is that by default search output have to be displayed in chronological order (the later documents is - the better, there is a "CreationTime" field in document)
I have implemented first part (both internal and non-strict matches) using ngram token filter.
Second part (chronological order), implemented by using "sort" feature in search query.
My problem is that search on ngrammed field outputs to many non-relevant results
For example if request was "Computer" then output contains documents with Titles
- "Computer"
- "Compressor"
- "Company"
Those non-relevant results, ofcourse, have lower _score then documents with "Computer" in Title. But sorting by CreationTime neutralizes that so the users often see non-relevant results at the top (because they was created later).
My overall question - how can I work around this situation?
Few ways I can tell right away:
- Reimplement search technique to make it more "specific" (some how minimize redundancy of search output in the first place), so I can sort those results;
- Use some kind of factorisation (using FunctionScore compound query and use some factor based on CreatedDate (e.g. days from 2012-01-01 to CreationTime) as field value factor);
I have totally failed in both directions, so I'm asking you, guys, to help me.
Now, being more specific:
Elastic Version - 6.2.2
Index Setting\Mapping:
{
"tendersearch":{
"aliases":{
},
"mappings":{
"_doc":{
"properties":{
"@timestamp":{
"type":"keyword"
},
"@version":{
"type":"keyword"
},
"DateModified":{
"type":"date"
},
"Id":{
"type":"keyword"
},
"tender":{
"properties":{
"CreationTime":{
"type":"date"
},
"Factor":{
"type":"long"
},
"Title":{
"type":"text",
"fields":{
"ngram":{
"type":"text",
"analyzer":"trigrams"
},
"raw":{
"type":"keyword"
}
},
"analyzer":"word_delim_analyzer"
}
}
}
}
}
},
"settings":{
"index":{
"number_of_shards":"1",
"provided_name":"tendersearch",
"max_result_window":"2147483647",
"creation_date":"1522832508985",
"analysis":{
"filter":{
"trigrams_filter":{
"type":"ngram",
"min_gram":"4",
"max_gram":"4"
},
"word_delim_catenate":{
"catenate_all":"true",
"type":"word_delimiter"
}
},
"analyzer":{
"trigrams":{
"filter":[
"lowercase",
"word_delim_catenate",
"trigrams_filter"
],
"type":"custom",
"tokenizer":"whitespace"
},
"word_delim_analyzer":{
"filter":[
"lowercase",
"word_delim_catenate"
],
"type":"custom",
"tokenizer":"whitespace"
}
}
},
"number_of_replicas":"1",
"uuid":"kY7S9_NhSyCgWi047v0rSA",
"version":{
"created":"6020199"
}
}
}
}
}
Search Request Sample:
{
"size":10,
"from":0,
"sort":[
{
"tender.CreationTime":{
"order":"desc"
}
},
"_score"
],
"query":{
"bool":{
"must":[
{
"match":{
"tender.Title.ngram":{
"query":"Computer"
}
}
}
]
}
}
}
Any help is hightly appreciated!