Right now, I'm able to create a working stopword filter in the following way.
company_name_stopword = ["inc","corp"] _company_name_stopword_filter = dsl.token_filter("_company_name_stopword_filter", type="stop", ignore_case = "true", stopwords=company_name_stopword ) _tag_analyzer = dsl.analyzer('tag_analyzer', tokenizer="whitespace", filter=["lowercase",_company_name_stopword_filter] ) response = _tag_analyzer.simulate( text = 'Apple Inc', using = localhost) print([t.token for t in response.tokens])
The objective is to remove from
text input and string in the list
However, the index I'm using, uses
tokenizer="keyword" instead of
Is it possible to create this filter with a keyword tokenizer? I can't change the tokenizer of the index without incurring into potential large behavioural/performance changes...