WordDelimiterTokenFilter doesn't seem to be generating expected tokens


(Atul Bagga) #1

Here is my analyzer -

{
"analysis": {
"filter": {
"wordDelimiter": {
"type": "word_delimiter",
"generate_word_parts": "true",
"generate_number_parts": "true",
"catenate_words": "false",
"catenate_numbers": "false",
"catenate_all": "false",
"split_on_case_change": "true",
"preserve_original": "true",
"split_on_numerics": "true",
"stem_english_possessive": "true"
}
},
"analyzer": {
"content_analyzer1": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"wordDelimiter",
"lowercase"
]
}
}
}
}

When I try to analyze the text "ElasticSearch.TestProject"

I expect the tokens elastic, search, test, project, elasticsearch, testproject, elasticsearch.testproject to be generated since I have split_on_case_change, split_on_numerics on and using a standard tokenizer which should tokenize on "."

But Actually I only see following tokens -
elasticsearch.testproject, elastic, search, test, project

Is there a way to get the expected tokens I want?


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.