Synonym inconsistent analyze result


#1

hi

I have synonym text file having full-width and half-width digits and Chinese character equivalent of the digits.

example of one row in MySynonym.txt is "7,7,七,柒".

my settings for filter has
"local_synonym_test": {
"type": "synonym",
"synonyms_path": "analysis/MySynonym.txt"
}

my settings for analyzer has
"my_ngram_analyzer_test": {
"type": "custom",
"tokenizer": "my_ngram_tokenizer",
"filter": [
"local_synonym_test",
"dash_as_alphanum_fltr"
]
}

my settings for tokenizer has
"my_ngram_tokenizer" :{
"type": "ngram",
"token_chars": [
"letter",
"digit"
],
"min_gram": "1",
"max_gram": "1"
}

when i execute _analyze multiple times, i get inconsistent synonym results:
GET mytestindex/_analyze
{
"analyzer" : "my_ngram_analyzer_test",
"text" : "7杯"
}

sometimes the result is RESULT1:
returns only 7 at position 0.

sometimes the result is RESULT2:
returns all 7,7,七,柒 at postiion 0.

in Windows environment i get consistent RESULT2, but it is not the case in CentOs environment.

what should i do to make it consistent to return RESULT2 in CentOs environment?

thank you.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.