Elasticsearch 7.10.2
Description:
When using synonym_graph filter, and if a synonym rule is matched, Elastic is not searching for individual tokens.
I have defined below synonym rule.
"synonyms": [
"Country Federal Police, CFP"
]
When I search for "Country Federal Police", I get documents containing "CFP" and "Country Federal Police" (All 3 words together). It does not match documents containing "Country" , "Federal" and "Police" tokens individually. However if I remove the synonym rule, document containing any 1 of the 3 tokens are also returned. I am using standard tokenizer.
Is this expected behaviour? I would expect it to consider individual tokens of original search string.
Step to reproduce:
Mapping
PUT testsynonymgraph
{
"settings": {
"analysis": {
"filter": {
"search_synonyms": {
"type": "synonym_graph",
"synonyms": [
"Country Federal Police, CFP"
]
}
},
"analyzer": {
"default_search": {
"filter": ["lowercase", "asciifolding", "search_synonyms", "stop", "kstem"],
"type": "custom",
"tokenizer": "standard"
},
"default": {
"filter": ["lowercase", "asciifolding", "stop", "kstem"],
"type": "custom",
"tokenizer": "standard"
}
}
}
},
"mappings": {
"properties": {
"Name": {
"type": "text"
}
}
}
}
Indexing 4 documents
POST _bulk
{ "index" : { "_index" : "testsynonymgraph", "_id" : "1" } }
{ "FaCS Name" : "Country Federal Police" }
{ "index" : { "_index" : "testsynonymgraph", "_id" : "2" } }
{ "FaCS Name" : "Country Defence Force" }
{ "index" : { "_index" : "testsynonymgraph", "_id" : "3" } }
{ "FaCS Name" : "Country Reserve Police" }
{ "index" : { "_index" : "testsynonymgraph", "_id" : "4" } }
{ "FaCS Name" : "CFP" }
Search:
POST /testsynonymgraph/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "Country Federal Police",
"fuzziness": "Auto"
}
}
]
}
},
"size": 10
}
Search Result:
"hits" : [
{
"_index" : "testsynonymgraph",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.08334,
"_source" : {
"FaCS Name" : "Country Federal Police"
}
},
{
"_index" : "testsynonymgraph",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.5956267,
"_source" : {
"FaCS Name" : "CFP"
}
}
]
Problems:
Other documents "Country Federal Police" and "Country Reserve Police" should also be returned in result.