Then search for it with query "123456", I got no hit. However if I did
everything from scratch and indexed a slightly different document (it's
actually the same doc with first field removed):
As far as I can see from your recreation you only create the analyzer but
don't associate it to your fields by specifying your mappings. Also, when
you query you don't soecify the field you want to query, thus you are using
the _all which has its own analyzer, which means that even if you had
specified the proper mappings the query would execute against a different
field with a different analyzer.
On Monday, March 31, 2014 12:12:37 PM UTC+2, Huy Phan wrote:
Then search for it with query "123456", I got no hit. However if I did
everything from scratch and indexed a slightly different document (it's
actually the same doc with first field removed):
The configuration index.analysis.analyzer.default_index is already set so I
don't think there's a need to specify my mappings since I actually want to
use the comma analyzer for all the fields. And from what I understand, that
default_index is also applied to _all field.
As you could see in my gist, I also overrode the "standard" analyzer since
I doubted something went wrong with defaul_index.
You may ask about the default_search configuration, my query "123456" is
rather simple so I don't think the default analyzer would make any changes
on it (and yes, I did verify that using the Analyzer API).
Even if there's something wrong with my settings, that still doesn't
clearly explain why I got the result with the second document but not with
the first one.
On Monday, 31 March 2014 19:45:42 UTC+8, Luca Cavanna wrote:
As far as I can see from your recreation you only create the analyzer but
don't associate it to your fields by specifying your mappings. Also, when
you query you don't soecify the field you want to query, thus you are using
the _all which has its own analyzer, which means that even if you had
specified the proper mappings the query would execute against a different
field with a different analyzer.
On Monday, March 31, 2014 12:12:37 PM UTC+2, Huy Phan wrote:
Then search for it with query "123456", I got no hit. However if I did
everything from scratch and indexed a slightly different document (it's
actually the same doc with first field removed):
Right, I did miss a couple of things there, sorry about that. Will have
another look and get back to you then
On Mon, Mar 31, 2014 at 2:23 PM, Huy Phan dachuy@gmail.com wrote:
Hi Luca,
The configuration index.analysis.analyzer.default_index is already set so
I don't think there's a need to specify my mappings since I actually want
to use the comma analyzer for all the fields. And from what I understand,
that default_index is also applied to _all field.
As you could see in my gist, I also overrode the "standard" analyzer since
I doubted something went wrong with defaul_index.
You may ask about the default_search configuration, my query "123456" is
rather simple so I don't think the default analyzer would make any changes
on it (and yes, I did verify that using the Analyzer API).
Even if there's something wrong with my settings, that still doesn't
clearly explain why I got the result with the second document but not with
the first one.
On Monday, 31 March 2014 19:45:42 UTC+8, Luca Cavanna wrote:
As far as I can see from your recreation you only create the analyzer but
don't associate it to your fields by specifying your mappings. Also, when
you query you don't soecify the field you want to query, thus you are using
the _all which has its own analyzer, which means that even if you had
specified the proper mappings the query would execute against a different
field with a different analyzer.
On Monday, March 31, 2014 12:12:37 PM UTC+2, Huy Phan wrote:
Then search for it with query "123456", I got no hit. However if I did
everything from scratch and indexed a slightly different document (it's
actually the same doc with first field removed):
custom tokenizer should be used in a field that is configured in a mapping
always set both search and index analyzer for a field
avoid setting up a custom tokenizer for _all when including more than one
field to _all (which is the default). This will give unpredictable results
because tokens from many fields are merged into _all. In edge cases, when a
field is first for example, you may be able to produce a hit. But this is
pure accidentally.
when searching with q parameter, do not forget to specify field name
Jörg
On Mon, Mar 31, 2014 at 2:23 PM, Huy Phan dachuy@gmail.com wrote:
Hi Luca,
The configuration index.analysis.analyzer.default_index is already set so
I don't think there's a need to specify my mappings since I actually want
to use the comma analyzer for all the fields. And from what I understand,
that default_index is also applied to _all field.
As you could see in my gist, I also overrode the "standard" analyzer since
I doubted something went wrong with defaul_index.
You may ask about the default_search configuration, my query "123456" is
rather simple so I don't think the default analyzer would make any changes
on it (and yes, I did verify that using the Analyzer API).
Even if there's something wrong with my settings, that still doesn't
clearly explain why I got the result with the second document but not with
the first one.
On Monday, 31 March 2014 19:45:42 UTC+8, Luca Cavanna wrote:
As far as I can see from your recreation you only create the analyzer but
don't associate it to your fields by specifying your mappings. Also, when
you query you don't soecify the field you want to query, thus you are using
the _all which has its own analyzer, which means that even if you had
specified the proper mappings the query would execute against a different
field with a different analyzer.
On Monday, March 31, 2014 12:12:37 PM UTC+2, Huy Phan wrote:
Then search for it with query "123456", I got no hit. However if I did
everything from scratch and indexed a slightly different document (it's
actually the same doc with first field removed):
I didn't notice that _all field turned out to be unpredictable at times.
There are certain reasons that we don't want to (or we can't) predefine our
mappings when creating index, that's why I used the default_indexconfiguration there.
What I'm doing is to to implement a google-like search with Elasticsearch
so I don't want to specify any field when searching. I figured out that I
have to create another field to aggregate the terms by myself instead of
relying on _all field.
Anyway, that was great answer and it did help me to understand my problem.
Thanks Jörg.
On Monday, 31 March 2014 21:09:06 UTC+8, Jörg Prante wrote:
custom tokenizer should be used in a field that is configured in a
mapping
always set both search and index analyzer for a field
avoid setting up a custom tokenizer for _all when including more than
one field to _all (which is the default). This will give unpredictable
results because tokens from many fields are merged into _all. In edge
cases, when a field is first for example, you may be able to produce a hit.
But this is pure accidentally.
when searching with q parameter, do not forget to specify field name
Jörg
On Mon, Mar 31, 2014 at 2:23 PM, Huy Phan <dac...@gmail.com <javascript:>>wrote:
Hi Luca,
The configuration index.analysis.analyzer.default_index is already set
so I don't think there's a need to specify my mappings since I actually
want to use the comma analyzer for all the fields. And from what I
understand, that default_index is also applied to _all field.
As you could see in my gist, I also overrode the "standard" analyzer
since I doubted something went wrong with defaul_index.
You may ask about the default_search configuration, my query "123456" is
rather simple so I don't think the default analyzer would make any changes
on it (and yes, I did verify that using the Analyzer API).
Even if there's something wrong with my settings, that still doesn't
clearly explain why I got the result with the second document but not with
the first one.
On Monday, 31 March 2014 19:45:42 UTC+8, Luca Cavanna wrote:
As far as I can see from your recreation you only create the analyzer
but don't associate it to your fields by specifying your mappings. Also,
when you query you don't soecify the field you want to query, thus you are
using the _all which has its own analyzer, which means that even if you had
specified the proper mappings the query would execute against a different
field with a different analyzer.
On Monday, March 31, 2014 12:12:37 PM UTC+2, Huy Phan wrote:
Then search for it with query "123456", I got no hit. However if I did
everything from scratch and indexed a slightly different document (it's
actually the same doc with first field removed):
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.