Discuss the Elastic Stack

Analysis of identifiers with dashes

Elastic Stack Elasticsearch

romansitina (Roman Sitina) July 31, 2020, 12:18pm 1

Hello,

I'm trying to solve this issue:

I have text

Information about CD-3568 can be found in document DKF-98346-B-261

and I need to get it analysed as

information about cd-3568 can be found in document dkf 98346 b 261

The reason is that CD-xxxx is specific form of identifier which I want to retain.

Is it possible to create such analysis chain with combination of existing analysers and tokenisers without writing custom plugin?

Thanks for any ideas
Roman

spinscale (Alexander Reelsen) August 3, 2020, 10:11am 2

Hey,

there are a couple of solutions to this problem, not sure which one fits you best.

If you index this field a second time (using a multi field for example) using a whitespace tokenizer, that token would be stored as is, and could also be searched like that.
Maybe (I haven't played around with) the word delimiter filter can help you as well.
Another solution would be to use a pattern_replace char filter and replace the dash with a special character, but that would also require some processing on the search side, so that sounds like too much work to me.

Have you thought about extracting that token into it's own field so simplify searching/filtering for it?

--Alex

system (system) Closed August 31, 2020, 10:11am 3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views	Activity
Searching for exactly a hyphenated word Elasticsearch	3	12942	January 15, 2019
How do I make dashes insignificant in a kibana search Kibana	5	5438	July 6, 2017
Changing Analyzer behavior for hyphens - suggestions? Elasticsearch	7	12027	July 5, 2017
How to Index Words Actual form and Modified form into Elastic Search Elasticsearch	4	425	November 29, 2018
Serch text with special character Elasticsearch	10	13136	August 20, 2020

© 2020. All Rights Reserved - Elasticsearch

Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries
Trademarks
Terms
Privacy
Brand
Code of Conduct

Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.