Custom analyzer without a tokenizer

Sagar_Shah · March 3, 2015, 6:02pm

Hello everyone,
I am working on a defining a mapping in elastic search, which can have few
fields on the fly. I can define the types & index using dynamic templates,
but I would like to know the difference between following two and which one
is preferred over the other.
I do not want to break down the string into tokens but use it in single
complete string

Option 1. Field Index : not_analyzed
Option 2. Field Index: customer analyzer with no tokenizer

Are there any performance differences for the above two approaches?
How does the 2nd option works (A custom analyzer with no tokenizer)? and
how can I create mapping for the same?

Appreciate your inputs!

Regards,
Sagar Shah

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fadbe843-aaba-4e9e-9f7f-a4d20b0a4569%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · March 3, 2015, 6:36pm

On Tue, Mar 3, 2015 at 1:02 PM, Sagar Shah sagarshah1983@gmail.com wrote:

Hello everyone,

I am working on a defining a mapping in Elasticsearch, which can have few
fields on the fly. I can define the types & index using dynamic templates,
but I would like to know the difference between following two and which one
is preferred over the other.
I do not want to break down the string into tokens but use it in single
complete string

Option 1. Field Index : not_analyzed
Option 2. Field Index: customer analyzer with no tokenizer

Are there any performance differences for the above two approaches?
How does the 2nd option works (A custom analyzer with no tokenizer)? and
how can I create mapping for the same?

I believe if you leave the tokenizer out you get the StandardTokenizer.
Its very different from not_analyzed. not_analyzed is like "don't break
this up - don't change it at all - I'm going to search for it exactly how
I send it to you". I believe the standard tokenizer does icu word
segmentation: UAX #29: Unicode Text Segmentation .

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3fx1hhX7RE9w_jK1YZK47URUiaUjK%3D2yXzyPQGPMd4BA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Sagar_Shah · March 4, 2015, 12:43am

Thanks Nikolas for clarification.
Do you think there would be any difference in performance between the two,
given that I would always be searching full text or matching phrase using
match_phrase for the partial matching?

Please advise.

On Tue, Mar 3, 2015 at 12:36 PM, Nikolas Everett nik9000@gmail.com wrote:

On Tue, Mar 3, 2015 at 1:02 PM, Sagar Shah sagarshah1983@gmail.com
wrote:

Hello everyone,

I am working on a defining a mapping in Elasticsearch, which can have
few fields on the fly. I can define the types & index using dynamic
templates, but I would like to know the difference between following two
and which one is preferred over the other.
I do not want to break down the string into tokens but use it in single
complete string

Option 1. Field Index : not_analyzed
Option 2. Field Index: customer analyzer with no tokenizer

Are there any performance differences for the above two approaches?
How does the 2nd option works (A custom analyzer with no tokenizer)? and
how can I create mapping for the same?

I believe if you leave the tokenizer out you get the StandardTokenizer.
Its very different from not_analyzed. not_analyzed is like "don't break
this up - don't change it at all - I'm going to search for it exactly how
I send it to you". I believe the standard tokenizer does icu word
segmentation: UAX #29: Unicode Text Segmentation .

Nik

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/fvbTw4j3_iw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3fx1hhX7RE9w_jK1YZK47URUiaUjK%3D2yXzyPQGPMd4BA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3fx1hhX7RE9w_jK1YZK47URUiaUjK%3D2yXzyPQGPMd4BA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--

Regards,
Sagar Shah

Too many people think more of security instead of opportunity. They seem
more afraid of life than death!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANZnv5hFf_eXc%3DG3TqP-0dUV9%3DDEATnh7hBoHYG2W2VcUTxUcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Creating new indices with mapping and analyzers via config file Elasticsearch	3	392	July 5, 2017
Default not analyzed? Elasticsearch	5	1625	July 6, 2017
Default not_analyzed for all fields with type string? Elasticsearch	1	525	July 6, 2017
Disabling default analyzer Elasticsearch	4	2885	July 6, 2017
ElasticSearch Mapping Elasticsearch	2	342	July 6, 2017

Custom analyzer without a tokenizer

Related topics