Force token filter to output just one token

grad1 · June 8, 2021, 1:17pm

Hi,

I am working on the devlopment of the search solution based on Elasticsearch for Polish language. Initially, I used the recommended analysis-stempel plugin to stem Polish words but after some experiments I found it not ideal for quite a few words important for my search case.

Then, I found an another plugin created specifically for the Polish language: GitHub - allegro/elasticsearch-analysis-morfologik: Morfologik Polish Lemmatizer plugin for Elasticsearch. Based on some tests, it yields better results for my use case. However, it (its token filter) often outputs more than one output token per each input token (word). This is because it tries to output the word in both basic masculine and feminine form. For example, the word "czerwoną" gets stemmed into "czerwona" and "czerwony".

I don't want that behaviour because it unnecessarily creates redundancy and negatively impacts the performance of my search queries.

Is there any way to limit the number of tokens output per one input token? Taking just the first output token would fit all my needs.

Thanks a lot for your help!

system · July 6, 2021, 1:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
A less aggressive stemming token filter that strips only plural Elasticsearch	12	1075	July 6, 2017
Stemmer token filter result is different that it should be Elasticsearch	2	390	July 6, 2017
Morfologik (Polish) Analysis for ElasticSearch Elasticsearch	1	890	July 6, 2017
Length Token Filter Elasticsearch	10	1751	July 6, 2017
How to use Limit Token Filter? Elasticsearch	6	1161	March 23, 2017

Force token filter to output just one token

Related topics