hi
I have synonym text file having full-width and half-width digits and Chinese character equivalent of the digits.
example of one row in MySynonym.txt is "7,7,七,柒".
my settings for filter has
"local_synonym_test": {
"type": "synonym",
"synonyms_path": "analysis/MySynonym.txt"
}
my settings for analyzer has
"my_ngram_analyzer_test": {
"type": "custom",
"tokenizer": "my_ngram_tokenizer",
"filter": [
"local_synonym_test",
"dash_as_alphanum_fltr"
]
}
my settings for tokenizer has
"my_ngram_tokenizer" :{
"type": "ngram",
"token_chars": [
"letter",
"digit"
],
"min_gram": "1",
"max_gram": "1"
}
when i execute _analyze multiple times, i get inconsistent synonym results:
GET mytestindex/_analyze
{
"analyzer" : "my_ngram_analyzer_test",
"text" : "7杯"
}
sometimes the result is RESULT1:
returns only 7 at position 0.
sometimes the result is RESULT2:
returns all 7,7,七,柒 at postiion 0.
in Windows environment i get consistent RESULT2, but it is not the case in CentOs environment.
what should i do to make it consistent to return RESULT2 in CentOs environment?
thank you.