I have an elasticsearch index and am using the following query:
"_source": [
"title",
"content"
],
"size": 15,
"from": 0,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "{{query}}",
"fields": [
"title",
"content"
],
"operator": "or"
}
},
"should": [
{
"multi_match": {
"query": "{{query}}",
"fields": [
"title.standard^16",
"content.standard^2"
],
"operator": "and"
}
},
{
"match_phrase": {
"content.standard": {
"query": "{{query}}",
"_name": "Phrase on title",
"boost": 1000
}
}
}
]
}
},
"highlight": {
"fields": {
"content": {}
},
"fragment_size": 100
}
}
Here is the mapping I set:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": true
}
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "my_analyzer",
"fields": {
"standard": {
"type": "text"
},
"stemmer": {
"type": "text",
"analyzer": "english"
}
}
},
"content": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "my_analyzer",
"fields": {
"standard": {
"type": "text"
},
"stemmer": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
Here is my logic with the query:
-
It will give the highest precedence to a phrase if it appears.
-
If not it will use the standard analyzer (that is the text, as is) and give it the highest precedence.
-
If all else doesn't match up, it will use the phonetic analyzer to get the results, that is the least precedence.
But obviously there is some fault to this as it seems to give higher precedence to the phonetic analyzer than the standard or phrase. For example, if I search for "Person of Indian Origin" it returns results on the top highlighting "Pursuant" "pursuing" and very, very less number of results with person of Indian origin although I know a large number of them exists. How do I solve this?
Here is some sample data to test it out - https://pastebin.com/mzfwz0b3