Hi everyone
I am doing a POC on best match document should rank higher
basically, we are using the TF-IDF algorithm to rank the documents
we are using a multi_match query to find a document
here is the query:
{
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "Leadership Development",
"fields": [
"title^35",
"description^15",
"tags^55"
],
"type": "phrase"
}
},
{
"multi_match": {
"query": "Leadership Development",
"fields": [
"title^25",
"description^5",
"tags^45"
]
}
}
]
}
},
"functions": [
]
}
},
"from": 0,
"size": 100
}
Mappings:
{
"tags": {
"analyzer": "standard",
"type": "text",
"fields": {
"keyword": {
"normalizer": "lcase_keyword",
"type": "keyword"
}
}
},
"title": {
"analyzer": "standard",
"store": true,
"type": "text"
},
"description": {
"analyzer": "standard",
"store": true,
"type": "text"
},
"normalizer": {
"lcase_keyword": {
"filter": [
"lowercase"
],
"type": "custom",
"char_filter": []
}
}
}
we have tagging concept where the user can add n number of tags to a document
we support partial as well as phrase match.
TF is calculate based on the length of the field as our use case is that a document can have n number of tags. because of the high number of tags present for a document, it gives lower TF and becasue of lower TF the overall doc score is also low
so because this relevant doc is shown at the end
Is there any way we can avoid this?