I have the following piece of text in a field using the standard tokenizer.
...erlegging van het EAB.<br /> Cass. 31 mei 2016, AR P.16.0606.N
The standard tokenizer splits the P.16.0606.N into different parts due to the period. Because of this i get a lot of unexpected results at search time when searching for P.16.0606.N.
The P part is matched in every code starting with a p, like P.10.1512.N.
What is the suggested way to search a piece of non linguistic text inside a larger text?
This would require an appropriate Analyzer configuration.
There's a multitude of options available depending on what you want to do.
You could index product codes like this into a separate searchable field or leave them intact amongst the English words but would have to take care that a rule designed to keep P.16.0606.N together as a single term does not also keep the full stop in words at the ends of sentences like this one.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.