Issue with multiple instances of the same token

dishant.sharma · April 28, 2022, 6:43am

I am getting a token multiple times after the analysis process. I am using the pattern token filter and using different regexes with different instances of token filter on the same input string. In some cases, I'm getting the same token where the start and end offsets are also the same including the token itself and in some cases, the same token is occurring with different start and end offsets.

This behavior is absolutely correct as I have the same token occurring at multiple locations in my input string. But, the issue is that I only want one token with a particular start and end offset and not multiple occurrences of the same token having the same start and end offsets. The other occurrences of the same token but having different start and end offsets are absolutely fine.

I don't want to use the "unique" token filter as it will remove all the occurrences of the token.

system · May 26, 2022, 6:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pattern Capture Token Filter giving same start and end offset position Elasticsearch	2	309	May 5, 2022
Comparison of tokens must not be repeated from query side to index document side Elasticsearch	1	368	August 27, 2019
Unique tokenfilter issues? Elasticsearch	2	392	July 6, 2017
Unique token filter not working with array of strings Elasticsearch	1	483	February 27, 2019
Duplicate Tokens in elasticsearch uax_url_email tokenizer Elasticsearch	1	180	April 30, 2022

Issue with multiple instances of the same token

Related topics