Non linguistic text in piece of content

EmericW · October 1, 2019, 1:32pm

Hi,

I have the following piece of text in a field using the standard tokenizer.

...erlegging van het EAB.<br /> Cass. 31 mei 2016, AR P.16.0606.N

The standard tokenizer splits the P.16.0606.N into different parts due to the period. Because of this i get a lot of unexpected results at search time when searching for P.16.0606.N.
The P part is matched in every code starting with a p, like P.10.1512.N.

What is the suggested way to search a piece of non linguistic text inside a larger text?

Mark_Harwood · October 4, 2019, 10:13am

This would require an appropriate Analyzer configuration.
There's a multitude of options available depending on what you want to do.
You could index product codes like this into a separate searchable field or leave them intact amongst the English words but would have to take care that a rule designed to keep P.16.0606.N together as a single term does not also keep the full stop in words at the ends of sentences like this one.

system · November 1, 2019, 10:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to search exact text? Elasticsearch	6	2000	July 6, 2017
Searching for "foo" should also find occurrence of "foo.bar" Elasticsearch	6	484	July 6, 2017
Full text search : search phrase in text Elasticsearch	5	428	July 6, 2017
Configuring the standard tokenizer elasticsearch Elasticsearch	2	484	October 30, 2018
Regarding NGramTokeniser Elasticsearch	1	287	July 6, 2017

Non linguistic text in piece of content

Related topics