Hello,
I'm using Elasticsearch v7.6.2. I am currently trying to create an custom analyser that uses a MultiplexerFilter with only one filters branc. This branch contains only a SynonymFilter. My goal is to keep original tokens with getting the synonyms. My test analyzer looks like:
GET test/_analyze
{
"explain": false,
"tokenizer": "whitespace",
"filter": [{
"type": "multiplexer",
"preserve_original": true,
"filters" : ["synonym_expression"]
}
],
"text": ["gré à gré"]
}
In the multiplexer, the synonym filter is configured in my index settings as:
"synonym_expression": {
"type": "synonym",
"synonyms_path": "dictionaries/protectedExpression.txt"
}
The synonym files contains this line (Solr format):
gré à gré => greagre
If I run the _analyze query, I get this output:
{
"tokens" : [
{
"token" : "gré",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "à",
"start_offset" : 4,
"end_offset" : 5,
"type" : "word",
"position" : 1
},
{
"token" : "gré",
"start_offset" : 6,
"end_offset" : 9,
"type" : "word",
"position" : 2
}
]
}
I see any synonym in the result tokens.
If I set the "preserve_original" to false, I get this new ouput:
{
"tokens" : [
{
"token" : "greagre",
"start_offset" : 0,
"end_offset" : 9,
"type" : "SYNONYM",
"position" : 0
}
]
}
I have my synonym in the output. I don't understand the behaviour of my analyzer. What I am doing wrong? How can I get in the output of my Multiplexer filter the original tokens plus the synonyms?
Thank you in advance for your Help.
Gérald