Hello,
it seems synonyms in 2nd file are not taken into account if already used in 1st file.
In 1st file, I have:
aaa,bbb
In second file, I have:
aaa,synaaa
bbb,synbbb
ccc,synccc
When I do the _analyze
{
"explain": false,
"analyzer": "synonym_full_indexing_analyzer",
"text" : "ccc"
}
I get for:
aaa --> aaa, bbb (I am not getting synaaa)
bbb --> aaa, bbb (I am not getting synbbb)
ccc --> ccc, synccc
I wonder if it is related to that:
elasticsearch, synonym
I didn't check the Elastic code.
Here is the analyzer:
"synonym_full_indexing_analyzer": {
"type": "custom",
"tokenizer": "fr_full_text_nospace_tokenizer",
"filter": [
"conditional_elision",
"lowercase",
"asciifolding",
"full_indexing_synonyms_syn1",
"full_indexing_synonyms_syn2",
"flatten_graph"
]
},
Elastic 7.17.6
RabBit_BR
(andre.coelho)
April 6, 2023, 12:07pm
2
Hi @antoinelefloch
I did this example based to post . Chaining synonym filters seems to be possible. I'm not sure what your filters are other than synonyms and maybe they are the real problem.
PUT /synonym_chaining
{
"settings": {
"index": {
"analysis": {
"filter": {
"first_synonyms": {
"type": "synonym",
"synonyms": ["aaa,bbb"]
},
"second_synonyms": {
"type": "synonym",
"synonyms": ["aaa,synaaa","bbb,synbbb","ccc,synccc"]
}
},
"analyzer": {
"synonym_analyzer": {
"filter": [
"first_synonyms",
"second_synonyms"
],
"tokenizer": "whitespace"
}
}
}
}
}
}
GET /synonym_chaining/_analyze
{
"analyzer": "synonym_analyzer",
"text": "bbb"
}
Tokens:
{
"tokens": [
{
"token": "bbb",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "synbbb",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "aaa",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
},
{
"token": "synaaa",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 0
}
]
}
when I try to use simple settings, the second synonym file is not taken into account anymore.
This is Elastic 7.7.1
PUT /synonym_chaining_files
{
"settings": {
"index": {
"analysis": {
"filter": {
"first_synonyms": {
"type": "synonym",
"synonyms_path": "resources/v1_02/my_synonyms_1.txt"
},
"second_synonyms": {
"type": "synonym",
"synonyms": "resources/v1_02/my_synonyms_2.txt"
}
},
"analyzer": {
"synonym_analyzer": {
"filter": [
"first_synonyms",
"second_synonyms"
],
"tokenizer": "whitespace"
}
}
}
}
}
}
1 Like
RabBit_BR
(andre.coelho)
April 12, 2023, 6:01pm
5
try replace synonyms to synonyms_path
It works much better, thank you
I found the reason of my problem. It is related to my analyzer which has filters typed "synonym_graph" and not simply "synonym". Here are the filters to reproduce the problem:
"filter": {
"first_synonyms": {
"type": "synonym_graph",
"synonyms": ["aaa,bbb"],
"lenient": "true"
},
"second_synonyms": {
"type": "synonym_graph",
"synonyms": ["aaa,synaaa","bbb,synbbb","ccc,synccc"],
"lenient": "true"
}
}
If I remove "lenient", I get the error:
"term: aaa analyzed to a token (aaa) with position increment != 1 (got: 0)"
So I suppose synonym_graph typed filters do not support having the same synonym token in 2 different synonym filters.
1 Like
system
(system)
Closed
May 11, 2023, 6:27pm
7
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.