Analysis of identifiers with dashes


I'm trying to solve this issue:

I have text

Information about CD-3568 can be found in document DKF-98346-B-261

and I need to get it analysed as

information about cd-3568 can be found in document dkf 98346 b 261

The reason is that CD-xxxx is specific form of identifier which I want to retain.

Is it possible to create such analysis chain with combination of existing analysers and tokenisers without writing custom plugin?

Thanks for any ideas


there are a couple of solutions to this problem, not sure which one fits you best.

  • If you index this field a second time (using a multi field for example) using a whitespace tokenizer, that token would be stored as is, and could also be searched like that.
  • Maybe (I haven't played around with) the word delimiter filter can help you as well.
  • Another solution would be to use a pattern_replace char filter and replace the dash with a special character, but that would also require some processing on the search side, so that sounds like too much work to me.

Have you thought about extracting that token into it's own field so simplify searching/filtering for it?


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.