Hi all, I am facing difficulty in directing ES to tokenize in a specific way.
Suppose I have a text like "Young Donor Plasma Transfusion and Age-Related Biomarkers"
In my synonyms file I have.
plasma transfusions,transfusion plasma,plasma transfusion,transfusions plasma => OTHERS_ID_996981,
plasma donor,donors plasma,donor plasma => OTHERS_ID_922024,
transfusion,transfusing blood products,transfusions,transfusion blood,blood transfusion => OTHERS_ID_1383248
The output I get from elyzer is this:
TOKENIZER: iplexus_tokenizer
{0:Young} {1:Donor} {2:Plasma} {3:Transfusion} {4:and} {5:Age} {6:Related} {7:Biomarkers}
TOKEN_FILTER: lowercase
{0:young} {1:donor} {2:plasma} {3:transfusion} {4:and} {5:age} {6:related} {7:biomarkers}
TOKEN_FILTER: synonym_rule
{0:young} {1:OTHERS_ID_922024,OTHERSCLASS} {2:OTHERS_ID_1383248,OTHERSCLASS} {3:and} {4:age} {5:related} {6:OTHERS_ID_418498,OTHERSCLASS}
Here we see that we have overlapping keywords in case of "donar plasma" and "plasma transfusion".
Elasticsearch only tokenizes it as "donar plasma" and "transfusion".
Is there a way to direct ES to tokenize overlapping keywords if found in synonym rule?
Here I expect the tokens to be:
"OTHERS_ID_922024" (donor plasma), "OTHERS_ID_922024" (plasma transfusion), "OTHERS_ID_1383248" (transfusion)
Analyzer settings are as follows, synonym file contains the above synonym logic:
{"settings": {
"analysis": {
"analyzer": {
"analyzer_search": {
"type": "custom",
"tokenizer": "iplexus_tokenizer",
"filter": [
"lowercase",
"synonym_rule"
]
},
"analyzer_q": {
"type": "custom",
"tokenizer": "iplexus_tokenizer",
"filter": [
"lowercase",
"synonym_rule_q"
]
},
"analyzer_summary": {
"type": "custom",
"tokenizer": "iplexus_tokenizer",
"filter": [
"lowercase",
"synonym_rule",
"biomedical_concept"
]
}
},
"tokenizer": {
"iplexus_tokenizer": {
"type": "pattern",
"pattern": "[^a-zA-Z0-9\\p{InGreek}\\p{No}\\p{Lm}\\+\\−]",
"max_token_length": "256"
}
},
"filter": {
"synonym_rule": {
"type": "synonym",
"synonyms_path": "synonyms_iplexus_index_v18.txt"
},
"synonym_rule_q": {
"type": "synonym",
"synonyms_path": "synonyms_iplexus_query_v18.txt"
}
}
}
}
}