this example shows how the english analyzer could be reimplemented as a custom analyzer. From there you can change any part of the analysis chain, e.g. use a different tokenizer that fits your needs.
Thanks reply.
Do you mean I can use tokenizer like Pattern tokenizer to split the sentences by my pre-defined the word delimiters?
What I want is to extend the standard tokenizer which used by most of language analyzers to add the support of splitting tokens by "." in addition to all existing word boundaries.
Using pattern tokenizer seems cannot cover all word boundaries which supported by standard tokenizer.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.