Your tokenizer splits based on , only, giving the tokens ab bcd and cde. This is then broken down into edge ngrams, which gives the result you see. To get the results you want you probably need to use a different tokenizer.
Yes. You are right. I will get two tokens ab bcd and cde but at same time I have another stopword filter which should remove ab from ab bcd and then create more tokens as per nedge_gram. If I use standard token analyzer instead of pattern tokenizer then things are working perfectly fine(meaning it is removing stopword) but again standard tokenizer is not desired.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.