Is it possible to create a stopword filter, when using tokenizer = " keyword"?

Ivo_Tavares · November 26, 2022, 6:31pm

Right now, I'm able to create a working stopword filter in the following way.

company_name_stopword = ["inc","corp"]
_company_name_stopword_filter = dsl.token_filter("_company_name_stopword_filter",
    type="stop",
    ignore_case = "true",
    stopwords=company_name_stopword
)

_tag_analyzer = dsl.analyzer('tag_analyzer',
    tokenizer="whitespace",
    filter=["lowercase",_company_name_stopword_filter] 
)

response = _tag_analyzer.simulate(
    text = 'Apple Inc',
    using = localhost)
print([t.token for t in response.tokens])

The objective is to remove from text input and string in the list company_name_stopword.

However, the index I'm using, uses tokenizer="keyword" instead of tokenizer="whitespace".

Is it possible to create this filter with a keyword tokenizer? I can't change the tokenizer of the index without incurring into potential large behavioural/performance changes...

system · December 24, 2022, 6:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
When using a whitespace tokenizer the stop words filter doesn't work Elasticsearch	2	676	July 5, 2017
Stop word filter problem Elasticsearch	5	383	July 6, 2017
Stopword with Query_string Elasticsearch	3	691	July 6, 2017
Stop words and Keyword tokenizer Elasticsearch	12	1904	July 6, 2017
Stop words filter not working Elasticsearch	9	2338	July 6, 2017

Is it possible to create a stopword filter, when using tokenizer = " keyword"?

Related topics