We are facing issue on elastic search index. When I create Index on words e.g Polo Shirts, what its doing is creating two Indexs one on Polo and another on Shirts. However I am expecting only one Index on Polo Shirts only.
Any one has faced the similar issue or any idea please.
When you say "index", do you mean an Elasticsearch index (e.g. the thing that has primary and replica shards)? Or do you mean the strings inside of documents are being tokenized on whitespace?
If you meant the latter, it is because full-text string fields are by default analyzed with the standard analyzer, which includes a tokenizer that breaks a string into tokens based on whitespace, special characters, etc. I'd recommend reading through the Analysis section of the Definitive Guide for more information on how analysis works in Elasticsearch -- it is a critical component to understanding how and why search works.
If you want to search for exact phrases, you could do one of several things:
Use a phrase query (or match query with phrase mode enabled)
Use an analyzer that does not tokenize on whitespace (e.g. the keyword tokenizer)
Index the field as not_analyzed, which will not perform any analysis and therefore no tokenization.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.