Hi everybody,
It's said here (https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms-expand-or-contract.html) that using simple synonym expansion at query time is an advantage for relevance. Please, consider my scenario:
My simple expansion synonym is:
shirt, blouse
The index structure is
{
"my_index": {
"aliases": {},
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"settings": {
"index": {
"analysis": {
"filter": {
"brazilian_stop": {
"type": "stop",
"stopwords": "_brazilian_"
},
"synonym_filter": {
"type": "synonym",
"synonyms_path": "sinonimos.txt"
}
},
"analyzer": {
"synonym_brazilian_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"synonym_filter",
"brazilian_stop"
],
"tokenizer": "standard"
}
}
}
}
}
}
}
Note that I'm not applying the synonym analyzer at index time.
And only 3 documents:
_id = 1
{
name: "shirt xyz"
}
_id = 2
{
name: "blouse xyz"
}
_id = 3
{
name: "blouse wvc"
}
My query is
{
"query":
{
"query_string":
{
"fields":["name"],
"query":"shirt",
"analyzer":"synonym_brazilian_analyzer"
}
}
}
If I search for "shirt", applying the synonym analyzer just at query time, shouldn't the _id=1 document have higher score than the _id=2 one?
I'm asking that because, according to the explain clause, both have exactly the same score.
My point is: what exactly is the query time advantage for relevance considering simple expansion?
What about the PerFieldSimilarity calculation?
Shouldn't the "shirt xyz" text have more relevance than the "blouse xyz", at query time?
Thanks a lot,
Guilherme