if you want to remove the punctuation aswell, you should consider using the standard tokenizer.
If that removed more than you want, you can consider using a Pattern-Analyzer. But be aware that this might be very slow.
Make sure to test your custom analyzers using the _analyze API
When we use standard tokenizer it will split from "F-35" into 2 different tokens as "F" and "35".
But i am expecting to be a single token like "F35" and "F-35".
Even Pattern-Analyzer will split the token by punctuation i guess?
that's correct. So depending on your data, if you can define a proper pattern for the cases you're referring to, you could use a character filter first, that removes the hyphen inside of words that contain numbers.
Then a query for "F35" and "F-35" would match docs containing both variants.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.