I have 2 documents whose fields are as follows.
Doc1: "Identity Vijay"
Doc2: "Vijay Identity"
When I search for 'identity' both documents are appearing with same score. Ideally I want 'Doc1' to have higher score as 'matched' word is at beginning. Is there any option in ElasticSearch to solve this use-case.
I am using a custom-analyzer and indexing documents.
I use a multi-match query for matching multiple fields.
Note: Is it possible to have higher score for exact matches. (E.g 'Identity' should have higher score than '--identity--' even though once tokenised their terms are same).
Mapping to create index -> Using a Custom analyzer.
curl -X PUT "localhost:9201/custom_index?pretty" -H 'Content-Type: application/json' -d'
{
"mappings": {
"vertex": {
"properties": {
"datatype": {
"type": "keyword",
"index": true
},
"name": {
"type": "text",
"analyzer": "custom_analyzer"
},
"title": {
"type": "text",
"analyzer": "custom_analyzer"
},
"description": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"custom_filter": {
"type": "word_delimiter_graph"
}
},
"analyzer": {
"custom_analyzer": {
"filter": [
"custom_filter",
"stop",
"lowercase"
],
"tokenizer": "standard"
}
}
}
}
}'
Inserting documents
curl -X PUT "localhost:9201/custom_index/vertex/1?pretty" -H 'Content-Type: application/json' -d' {
"datatype" : "hive_column",
"name" : "Identity Vijay",
"title" : "",
"description" : ""
}'
curl -X PUT "localhost:9201/custom_index/vertex/2?pretty" -H 'Content-Type: application/json' -d' {
"datatype" : "hive_column",
"name" : "Vijay Identity",
"title" : "",
"description" : ""
}'
Search term-> "Identity"
Search query
curl -X GET "localhost:9201/custom_index/_search?pretty" -H 'Content-Type: application/json' -d' {
"size" : 25,
"query" : {
"multi_match" : {
"query" : "identity",
"fuzziness" : "AUTO",
"fields" : [ "name^2", "datatype", "title", "description" ]
}
},
"from" : 0
}'
Any help greatly appreciated.