Hello everyone,
I am working on a defining a mapping in elastic search, which can have few
fields on the fly. I can define the types & index using dynamic templates,
but I would like to know the difference between following two and which one
is preferred over the other.
I do not want to break down the string into tokens but use it in single
complete string
Option 1. Field Index : not_analyzed
Option 2. Field Index: customer analyzer with no tokenizer
Are there any performance differences for the above two approaches?
How does the 2nd option works (A custom analyzer with no tokenizer)? and
how can I create mapping for the same?
I am working on a defining a mapping in Elasticsearch, which can have few
fields on the fly. I can define the types & index using dynamic templates,
but I would like to know the difference between following two and which one
is preferred over the other.
I do not want to break down the string into tokens but use it in single
complete string
Option 1. Field Index : not_analyzed
Option 2. Field Index: customer analyzer with no tokenizer
Are there any performance differences for the above two approaches?
How does the 2nd option works (A custom analyzer with no tokenizer)? and
how can I create mapping for the same?
I believe if you leave the tokenizer out you get the StandardTokenizer.
Its very different from not_analyzed. not_analyzed is like "don't break
this up - don't change it at all - I'm going to search for it exactly how
I send it to you". I believe the standard tokenizer does icu word
segmentation: UAX #29: Unicode Text Segmentation .
Thanks Nikolas for clarification.
Do you think there would be any difference in performance between the two,
given that I would always be searching full text or matching phrase using
match_phrase for the partial matching?
Please advise.
On Tue, Mar 3, 2015 at 12:36 PM, Nikolas Everett nik9000@gmail.com wrote:
I am working on a defining a mapping in Elasticsearch, which can have
few fields on the fly. I can define the types & index using dynamic
templates, but I would like to know the difference between following two
and which one is preferred over the other.
I do not want to break down the string into tokens but use it in single
complete string
Option 1. Field Index : not_analyzed
Option 2. Field Index: customer analyzer with no tokenizer
Are there any performance differences for the above two approaches?
How does the 2nd option works (A custom analyzer with no tokenizer)? and
how can I create mapping for the same?
I believe if you leave the tokenizer out you get the StandardTokenizer.
Its very different from not_analyzed. not_analyzed is like "don't break
this up - don't change it at all - I'm going to search for it exactly how
I send it to you". I believe the standard tokenizer does icu word
segmentation: UAX #29: Unicode Text Segmentation .
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.