Greetings,
I am trying to migrate from ES 5.5 to 7.3, but I got stuck at an issue with the synonym filter. The reference states "This filter tokenize synonyms with whatever tokenizer and token filters appear before it in the chain." . From it I understand that when I define a custom analyzer, the synonyms will get tokenized and filtered by previous token filters, but I don't want that. I need my synonims untouched. Here is a little example to show what I am doing:
PUT index_1
{
"mappings": {
"properties": {
"text": {
"analyzer": "custom_analyzer_all",
"norms": false,
"type": "text"
}
}
},
"settings": {
"index.number_of_replicas": 1,
"analysis": {
"analyzer": {
"custom_analyzer_all": {
"char_filter": [
],
"filter": [
"lowercase",
"custom_lemma",
"custom_wordpack"
],
"tokenizer": "standard",
"type": "custom"
}
},
"char_filter": {
},
"filter": {
"custom_wordpack": {
"type": "synonym",
"tokenizer": "whitespace",
"synonyms": [
"gym => gym, _amenities"
]
},
"custom_lemma": {
"type": "lemmagen",
"lexicon": "en"
}
}
},
"index.number_of_shards": 1
}
}
I am creating an index with a custom analyzer that uses lowercase, synonym and lemmagen filter. Lemmagen is a plugin(Link to repo ).
Insert a sample document.
POST index_1/_doc/1
{
"text":"Great to see Ivan can get things right in a gym - clean and everything works."
}
Now when I check the terms of the text field with _termvectors API:
POST index_1/_termvectors/1
{
"fields": ["text"]
}
I get this for the added synonym:
{
"_amenity" : {
"term_freq" : 1,
"tokens" : [
{
"position" : 10,
"start_offset" : 44,
"end_offset" : 47
}
]
}
}
"_amenity" is the lemmatized version of the synonym "_amenities". Even if I swap the places of the custom_lemma and custom_wordpack filters, it will still have the same effect(the synonym is inserted in its original form, but the custom_lemma filter again lematizes it). I need somehow to prevent the synonyms from getting analyzed by the filters or tokenizer. It used to be possible to specify a tokenizer inside the synonym token filter, but that no longer works. Could someone please suggest anything on how to have the text field analyzed as it is, but the synonyms are left untouched.