Hello, I want to implement Character group tokenizer in elasticsearch. How Do I implement an index with char_group tokenizer.
I am putting this setting in my index:
if you index "Gss InfoTech" and the search term is "gss info" and your query Match with operator "AND' you will not have results because "infotech != info".
If you remove the "and" the match will be on the "gss" token.
If you want to apply the match with the term "info" you will have to use the edge_ngram tokenizer.
previously I had used edge_ngram tokenizer. But the problem happening with it is -
when I am searching suppose : Information , I had given max_gram:10 and min_gram:3, so it is breaking information as inf, info, infr,...like that. Because of that, Information is coming below info. Meaning.. Document Like InfoEdge, Infotech coming first that Information Technology, which I dont want.
So for this reason I wanted a tokenizer which will break the words when a whitespace is encounter.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.