Is there a way to search terms lower cased?


(sezgin küçükkaraaslan) #1

Hi,
When using with default configuration and with no mapping, fields are
analyzed with lowercase token filter. So when I index a field with value,
let's say "ABC", it is tokenized as "abc". When I try to search it as I
insert it with the following query I get no results:

{
"query":{"term":"ABC"}
}

It seems that only query strings supports analyzers during search. Is there
a plan to add this feature to Elastic Search ?

Thanks in advance,
Sezgin Kucukkaraaslan
www.ifountain.com


(Lukáš Vlček) #2

Hi,
I did not try myself but it is possible to specify analyzer as a query
parameter:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Request_Parameters
http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Request_ParametersSo
it should definitely work in JSON too. Also did you check
http://www.elasticsearch.com/docs/elasticsearch/index_modules/analysis/analyzer/#Default_Analyzers
?
Lukas

2010/6/30 sezgin küçükkaraaslan sezo104@gmail.com

Hi,
When using with default configuration and with no mapping, fields are
analyzed with lowercase token filter. So when I index a field with value,
let's say "ABC", it is tokenized as "abc". When I try to search it as I
insert it with the following query I get no results:

{
"query":{"term":"ABC"}
}

It seems that only query strings supports analyzers during search. Is there
a plan to add this feature to Elastic Search ?

Thanks in advance,
Sezgin Kucukkaraaslan
www.ifountain.com


(Lukáš Vlček) #3

Well... the term query is not analyzed, see:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/term_query/

On Wed, Jun 30, 2010 at 2:19 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,
I did not try myself but it is possible to specify analyzer as a query
parameter:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Request_Parameters
http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Request_ParametersSo
it should definitely work in JSON too. Also did you check
http://www.elasticsearch.com/docs/elasticsearch/index_modules/analysis/analyzer/#Default_Analyzers
?
Lukas

2010/6/30 sezgin küçükkaraaslan sezo104@gmail.com

Hi,

When using with default configuration and with no mapping, fields are
analyzed with lowercase token filter. So when I index a field with value,
let's say "ABC", it is tokenized as "abc". When I try to search it as I
insert it with the following query I get no results:

{
"query":{"term":"ABC"}
}

It seems that only query strings supports analyzers during search. Is
there a plan to add this feature to Elastic Search ?

Thanks in advance,
Sezgin Kucukkaraaslan
www.ifountain.com


(Shay Banon) #4

If you want to have the text passed analyzed, then use the field query
(which is a nice field level wrapper for the query_string query):
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/field_query/
.

Note that the analysis process can be a simple one as lowercasing, and can
be more complex one that generates several terms for a single term analyzed.

-shay.banon

On Wed, Jun 30, 2010 at 3:22 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Well... the term query is not analyzed, see:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/term_query/

On Wed, Jun 30, 2010 at 2:19 PM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

Hi,
I did not try myself but it is possible to specify analyzer as a query
parameter:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Request_Parameters
http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/#Request_ParametersSo
it should definitely work in JSON too. Also did you check
http://www.elasticsearch.com/docs/elasticsearch/index_modules/analysis/analyzer/#Default_Analyzers
?
Lukas

2010/6/30 sezgin küçükkaraaslan sezo104@gmail.com

Hi,

When using with default configuration and with no mapping, fields are
analyzed with lowercase token filter. So when I index a field with value,
let's say "ABC", it is tokenized as "abc". When I try to search it as I
insert it with the following query I get no results:

{
"query":{"term":"ABC"}
}

It seems that only query strings supports analyzers during search. Is
there a plan to add this feature to Elastic Search ?

Thanks in advance,
Sezgin Kucukkaraaslan
www.ifountain.com


(Clinton Gormley) #5

For context - lukasvlcek had the conversation below in IRC, then left.

I'm answering him here


lukasvlcek:
kimchy: I haven't been thinking about it before... what is the
rationale of not allowing analyzer setup for term query when
Query DSL is used? See
http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-way-to-search-terms-lower-cased-tp932996.html

    I am just curious why user has to search -exact- terms (Lower vs
    Upper case)

sam_:
the default analyzer if nothing is specified is standard isn't
it?

lukasvlcek:
I did not try this particular example but I am confused by the
term query doc which explicitly says "not analyzed" (so even the
default analyzer is not used?)

sam_:
if it is not analyzed then I would suspect you need to provide
case
an exact match
the standard analyzer would result in it being converted

lukasvlcek:
wouldn't it be useful to have ability to specify analyzer?

sam_:
you can
well
at least when you define the mappings
the analyzer is used as part of the indexing
as an alternative I would think you could provide your own
parser implementation to which is what I'm trying to do
but have been unsuccessful

lukasvlcek:
but the point is if it is possible to specify analyzer when
querying via URL parameters then why can not specify analyzer
while using Query DSL
Gotta go now... but I would appreciate if anybody (kimchy?) can
follow up on that mail thread above (want to check that later)

----------------------------------------------

Answer:

(Note - this is as I understand the situation - I'm open to correction)

All data stored in ElasticSearch/Lucene is stored as a 'term' which is
atomic - it can't be broken down further.

So if you index {"text": "The quick brown fox jumped over the LAZY dog"}
then the default analyzer would:

  • remove stopwords
  • lowercase all text
  • split on whitespace and punctuation
  • result in these terms:
    'quick', 'brown', 'fox', 'jumped','over', 'lazy', 'dog'

If you then do this search:
{ "query_string": { "query": "QUICK dOg"}}

Then the default analyzer would analyze your query string and return the
following terms: "quick", "dog"

It then does a 'term' query for each of those terms and combines the
results.

If you did this search:
{ "wildcard": {"text": "o}}

Then it would first look at all terms, and find only those terms that
match that pattern, ie: 'brown', 'fox', 'over', 'dog'.

It then does a 'term' query for each of those terms and combines the
results.

So it doesn't make sense to analyze a 'term'. Terms are the result of
analysis. If you need to analyse a search "phrase" then you should use a
"query_string" or "field" query.

For the same reason, you can't sort on an analyzed field because the
original data doesn't exist. It is tokenised and stored as
terms. (unless the field is also stored? - not sure)

The analyzer used to analyze a search phrase is selected in this order:

  • "analyzer" specified in the query DSL, eg:

    { "query_string": { "query": "foo bar", "analyzer": "keyword"}}

  • "search_analyzer" specified in the mapping

  • "analyzer" specified in the mapping

  • the default_search analyzer specified in the index configuration

  • the default analyzer specified in the index configuration

  • the default_search analyzer specified in the node configuration

  • the default analyzer specified in the node configuration

  • the "standard" analyzer

(I think that's right - I may have added a couple in there that don't
actually exist)

Typically, it doesn't make sense to use a different analyzer at index
and search time, because you may end up searching for terms that don't
actually exist.

If a field is set to be 'not_analyzed', then the whole value is treated
as a term, so "ABC" and "abc" are different, and "abc" will not match
"abc def".

hope this helps

Clint

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.


(Lukáš Vlček) #6

Hey guys, thanks for keeping this conversation going. Appreciate this!
Lukas

On Wed, Jun 30, 2010 at 6:48 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

For context - lukasvlcek had the conversation below in IRC, then left.

I'm answering him here


lukasvlcek:
kimchy: I haven't been thinking about it before... what is the
rationale of not allowing analyzer setup for term query when
Query DSL is used? See

http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-way-to-search-terms-lower-cased-tp932996.html

   I am just curious why user has to search -exact- terms (Lower vs
   Upper case)

sam_:
the default analyzer if nothing is specified is standard isn't
it?

lukasvlcek:
I did not try this particular example but I am confused by the
term query doc which explicitly says "not analyzed" (so even the
default analyzer is not used?)

sam_:
if it is not analyzed then I would suspect you need to provide
case
an exact match
the standard analyzer would result in it being converted

lukasvlcek:
wouldn't it be useful to have ability to specify analyzer?

sam_:
you can
well
at least when you define the mappings
the analyzer is used as part of the indexing
as an alternative I would think you could provide your own
parser implementation to which is what I'm trying to do
but have been unsuccessful

lukasvlcek:
but the point is if it is possible to specify analyzer when
querying via URL parameters then why can not specify analyzer
while using Query DSL
Gotta go now... but I would appreciate if anybody (kimchy?) can
follow up on that mail thread above (want to check that later)

----------------------------------------------

Answer:

(Note - this is as I understand the situation - I'm open to correction)

All data stored in ElasticSearch/Lucene is stored as a 'term' which is
atomic - it can't be broken down further.

So if you index {"text": "The quick brown fox jumped over the LAZY dog"}
then the default analyzer would:

  • remove stopwords
  • lowercase all text
  • split on whitespace and punctuation
  • result in these terms:
    'quick', 'brown', 'fox', 'jumped','over', 'lazy', 'dog'

If you then do this search:
{ "query_string": { "query": "QUICK dOg"}}

Then the default analyzer would analyze your query string and return the
following terms: "quick", "dog"

It then does a 'term' query for each of those terms and combines the
results.

If you did this search:
{ "wildcard": {"text": "o}}

Then it would first look at all terms, and find only those terms that
match that pattern, ie: 'brown', 'fox', 'over', 'dog'.

It then does a 'term' query for each of those terms and combines the
results.

So it doesn't make sense to analyze a 'term'. Terms are the result of
analysis. If you need to analyse a search "phrase" then you should use a
"query_string" or "field" query.

For the same reason, you can't sort on an analyzed field because the
original data doesn't exist. It is tokenised and stored as
terms. (unless the field is also stored? - not sure)

The analyzer used to analyze a search phrase is selected in this order:

  • "analyzer" specified in the query DSL, eg:

    { "query_string": { "query": "foo bar", "analyzer": "keyword"}}

  • "search_analyzer" specified in the mapping

  • "analyzer" specified in the mapping

  • the default_search analyzer specified in the index configuration

  • the default analyzer specified in the index configuration

  • the default_search analyzer specified in the node configuration

  • the default analyzer specified in the node configuration

  • the "standard" analyzer

(I think that's right - I may have added a couple in there that don't
actually exist)

Typically, it doesn't make sense to use a different analyzer at index
and search time, because you may end up searching for terms that don't
actually exist.

If a field is set to be 'not_analyzed', then the whole value is treated
as a term, so "ABC" and "abc" are different, and "abc" will not match
"abc def".

hope this helps

Clint

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.


(sezgin küçükkaraaslan) #7

Thanks for the replies..
I think I'd better to explain what I'm trying to do. I'm working on a IT
event management application and want to store my data on Elastic Search to
leverage it's clustering and redundancy features. The requirement is to
index data and give the operators the flexibility to search events case
insensitively from the UI. To gain from performance I don't want all fields
in my event model to be analyzed. For example I want to keep fields like
"identifier", which I know that it will consist of one word, as
"not_analyzed". The problem with field search here is that I can't search
these kind of properties with it. (It gives zero result.). So I will not be
able to use it all the time. After some thinking, I decide to use two kinds
of analyzers for my fields, which are:

myAnalyzer1 :
filter: [lowercase]
tokenizer: keyword

for the fields like "identifier", and :

myAnalyzer2:
filter:[lowercase]
tokenizer: whitespace

for the fields like "description", which can consist of multiple words.

I can take some advices here, am I in the right path? Is there any
performance loss that I will bear by using the first analyzer instead of
keeping it as "not_analyzed"?
Thank you very much again...

Sezgin Kucukkaraaslan
www.ifountain.com

On Thu, Jul 1, 2010 at 2:01 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hey guys, thanks for keeping this conversation going. Appreciate this!
Lukas

On Wed, Jun 30, 2010 at 6:48 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

For context - lukasvlcek had the conversation below in IRC, then left.

I'm answering him here


lukasvlcek:
kimchy: I haven't been thinking about it before... what is the
rationale of not allowing analyzer setup for term query when
Query DSL is used? See

http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-way-to-search-terms-lower-cased-tp932996.html

   I am just curious why user has to search -exact- terms (Lower vs
   Upper case)

sam_:
the default analyzer if nothing is specified is standard isn't
it?

lukasvlcek:
I did not try this particular example but I am confused by the
term query doc which explicitly says "not analyzed" (so even the
default analyzer is not used?)

sam_:
if it is not analyzed then I would suspect you need to provide
case
an exact match
the standard analyzer would result in it being converted

lukasvlcek:
wouldn't it be useful to have ability to specify analyzer?

sam_:
you can
well
at least when you define the mappings
the analyzer is used as part of the indexing
as an alternative I would think you could provide your own
parser implementation to which is what I'm trying to do
but have been unsuccessful

lukasvlcek:
but the point is if it is possible to specify analyzer when
querying via URL parameters then why can not specify analyzer
while using Query DSL
Gotta go now... but I would appreciate if anybody (kimchy?) can
follow up on that mail thread above (want to check that later)

----------------------------------------------

Answer:

(Note - this is as I understand the situation - I'm open to correction)

All data stored in ElasticSearch/Lucene is stored as a 'term' which is
atomic - it can't be broken down further.

So if you index {"text": "The quick brown fox jumped over the LAZY dog"}
then the default analyzer would:

  • remove stopwords
  • lowercase all text
  • split on whitespace and punctuation
  • result in these terms:
    'quick', 'brown', 'fox', 'jumped','over', 'lazy', 'dog'

If you then do this search:
{ "query_string": { "query": "QUICK dOg"}}

Then the default analyzer would analyze your query string and return the
following terms: "quick", "dog"

It then does a 'term' query for each of those terms and combines the
results.

If you did this search:
{ "wildcard": {"text": "o}}

Then it would first look at all terms, and find only those terms that
match that pattern, ie: 'brown', 'fox', 'over', 'dog'.

It then does a 'term' query for each of those terms and combines the
results.

So it doesn't make sense to analyze a 'term'. Terms are the result of
analysis. If you need to analyse a search "phrase" then you should use a
"query_string" or "field" query.

For the same reason, you can't sort on an analyzed field because the
original data doesn't exist. It is tokenised and stored as
terms. (unless the field is also stored? - not sure)

The analyzer used to analyze a search phrase is selected in this order:

  • "analyzer" specified in the query DSL, eg:

    { "query_string": { "query": "foo bar", "analyzer": "keyword"}}

  • "search_analyzer" specified in the mapping

  • "analyzer" specified in the mapping

  • the default_search analyzer specified in the index configuration

  • the default analyzer specified in the index configuration

  • the default_search analyzer specified in the node configuration

  • the default analyzer specified in the node configuration

  • the "standard" analyzer

(I think that's right - I may have added a couple in there that don't
actually exist)

Typically, it doesn't make sense to use a different analyzer at index
and search time, because you may end up searching for terms that don't
actually exist.

If a field is set to be 'not_analyzed', then the whole value is treated
as a term, so "ABC" and "abc" are different, and "abc" will not match
"abc def".

hope this helps

Clint

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.


(Shay Banon) #8

Its a good way to solve what you are trying. You shouldn't notice the
performance difference in indexing time with this compared to
not_analyzed.

-shay.banon

2010/7/1 sezgin küçükkaraaslan sezo104@gmail.com

Thanks for the replies..
I think I'd better to explain what I'm trying to do. I'm working on a IT
event management application and want to store my data on Elastic Search to
leverage it's clustering and redundancy features. The requirement is to
index data and give the operators the flexibility to search events case
insensitively from the UI. To gain from performance I don't want all fields
in my event model to be analyzed. For example I want to keep fields like
"identifier", which I know that it will consist of one word, as
"not_analyzed". The problem with field search here is that I can't search
these kind of properties with it. (It gives zero result.). So I will not be
able to use it all the time. After some thinking, I decide to use two kinds
of analyzers for my fields, which are:

myAnalyzer1 :
filter: [lowercase]
tokenizer: keyword

for the fields like "identifier", and :

myAnalyzer2:
filter:[lowercase]
tokenizer: whitespace

for the fields like "description", which can consist of multiple words.

I can take some advices here, am I in the right path? Is there any
performance loss that I will bear by using the first analyzer instead of
keeping it as "not_analyzed"?
Thank you very much again...

Sezgin Kucukkaraaslan
www.ifountain.com

On Thu, Jul 1, 2010 at 2:01 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hey guys, thanks for keeping this conversation going. Appreciate this!
Lukas

On Wed, Jun 30, 2010 at 6:48 PM, Clinton Gormley <clinton@iannounce.co.uk

wrote:

For context - lukasvlcek had the conversation below in IRC, then left.

I'm answering him here


lukasvlcek:
kimchy: I haven't been thinking about it before... what is the
rationale of not allowing analyzer setup for term query when
Query DSL is used? See

http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-way-to-search-terms-lower-cased-tp932996.html

   I am just curious why user has to search -exact- terms (Lower vs
   Upper case)

sam_:
the default analyzer if nothing is specified is standard isn't
it?

lukasvlcek:
I did not try this particular example but I am confused by the
term query doc which explicitly says "not analyzed" (so even the
default analyzer is not used?)

sam_:
if it is not analyzed then I would suspect you need to provide
case
an exact match
the standard analyzer would result in it being converted

lukasvlcek:
wouldn't it be useful to have ability to specify analyzer?

sam_:
you can
well
at least when you define the mappings
the analyzer is used as part of the indexing
as an alternative I would think you could provide your own
parser implementation to which is what I'm trying to do
but have been unsuccessful

lukasvlcek:
but the point is if it is possible to specify analyzer when
querying via URL parameters then why can not specify analyzer
while using Query DSL
Gotta go now... but I would appreciate if anybody (kimchy?) can
follow up on that mail thread above (want to check that later)

----------------------------------------------

Answer:

(Note - this is as I understand the situation - I'm open to correction)

All data stored in ElasticSearch/Lucene is stored as a 'term' which is
atomic - it can't be broken down further.

So if you index {"text": "The quick brown fox jumped over the LAZY dog"}
then the default analyzer would:

  • remove stopwords
  • lowercase all text
  • split on whitespace and punctuation
  • result in these terms:
    'quick', 'brown', 'fox', 'jumped','over', 'lazy', 'dog'

If you then do this search:
{ "query_string": { "query": "QUICK dOg"}}

Then the default analyzer would analyze your query string and return the
following terms: "quick", "dog"

It then does a 'term' query for each of those terms and combines the
results.

If you did this search:
{ "wildcard": {"text": "o}}

Then it would first look at all terms, and find only those terms that
match that pattern, ie: 'brown', 'fox', 'over', 'dog'.

It then does a 'term' query for each of those terms and combines the
results.

So it doesn't make sense to analyze a 'term'. Terms are the result of
analysis. If you need to analyse a search "phrase" then you should use a
"query_string" or "field" query.

For the same reason, you can't sort on an analyzed field because the
original data doesn't exist. It is tokenised and stored as
terms. (unless the field is also stored? - not sure)

The analyzer used to analyze a search phrase is selected in this order:

  • "analyzer" specified in the query DSL, eg:

    { "query_string": { "query": "foo bar", "analyzer": "keyword"}}

  • "search_analyzer" specified in the mapping

  • "analyzer" specified in the mapping

  • the default_search analyzer specified in the index configuration

  • the default analyzer specified in the index configuration

  • the default_search analyzer specified in the node configuration

  • the default analyzer specified in the node configuration

  • the "standard" analyzer

(I think that's right - I may have added a couple in there that don't
actually exist)

Typically, it doesn't make sense to use a different analyzer at index
and search time, because you may end up searching for terms that don't
actually exist.

If a field is set to be 'not_analyzed', then the whole value is treated
as a term, so "ABC" and "abc" are different, and "abc" will not match
"abc def".

hope this helps

Clint

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.


(sezgin küçükkaraaslan) #9

Thanks...

On Fri, Jul 2, 2010 at 4:34 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Its a good way to solve what you are trying. You shouldn't notice the
performance difference in indexing time with this compared to
not_analyzed.

-shay.banon

2010/7/1 sezgin küçükkaraaslan sezo104@gmail.com

Thanks for the replies..

I think I'd better to explain what I'm trying to do. I'm working on a IT
event management application and want to store my data on Elastic Search to
leverage it's clustering and redundancy features. The requirement is to
index data and give the operators the flexibility to search events case
insensitively from the UI. To gain from performance I don't want all fields
in my event model to be analyzed. For example I want to keep fields like
"identifier", which I know that it will consist of one word, as
"not_analyzed". The problem with field search here is that I can't search
these kind of properties with it. (It gives zero result.). So I will not be
able to use it all the time. After some thinking, I decide to use two kinds
of analyzers for my fields, which are:

myAnalyzer1 :
filter: [lowercase]
tokenizer: keyword

for the fields like "identifier", and :

myAnalyzer2:
filter:[lowercase]
tokenizer: whitespace

for the fields like "description", which can consist of multiple words.

I can take some advices here, am I in the right path? Is there any
performance loss that I will bear by using the first analyzer instead of
keeping it as "not_analyzed"?
Thank you very much again...

Sezgin Kucukkaraaslan
www.ifountain.com

On Thu, Jul 1, 2010 at 2:01 AM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

Hey guys, thanks for keeping this conversation going. Appreciate this!
Lukas

On Wed, Jun 30, 2010 at 6:48 PM, Clinton Gormley <
clinton@iannounce.co.uk> wrote:

For context - lukasvlcek had the conversation below in IRC, then left.

I'm answering him here


lukasvlcek:
kimchy: I haven't been thinking about it before... what is the
rationale of not allowing analyzer setup for term query when
Query DSL is used? See

http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-way-to-search-terms-lower-cased-tp932996.html

   I am just curious why user has to search -exact- terms (Lower vs
   Upper case)

sam_:
the default analyzer if nothing is specified is standard isn't
it?

lukasvlcek:
I did not try this particular example but I am confused by the
term query doc which explicitly says "not analyzed" (so even the
default analyzer is not used?)

sam_:
if it is not analyzed then I would suspect you need to provide
case
an exact match
the standard analyzer would result in it being converted

lukasvlcek:
wouldn't it be useful to have ability to specify analyzer?

sam_:
you can
well
at least when you define the mappings
the analyzer is used as part of the indexing
as an alternative I would think you could provide your own
parser implementation to which is what I'm trying to do
but have been unsuccessful

lukasvlcek:
but the point is if it is possible to specify analyzer when
querying via URL parameters then why can not specify analyzer
while using Query DSL
Gotta go now... but I would appreciate if anybody (kimchy?) can
follow up on that mail thread above (want to check that later)

----------------------------------------------

Answer:

(Note - this is as I understand the situation - I'm open to correction)

All data stored in ElasticSearch/Lucene is stored as a 'term' which is
atomic - it can't be broken down further.

So if you index {"text": "The quick brown fox jumped over the LAZY dog"}
then the default analyzer would:

  • remove stopwords
  • lowercase all text
  • split on whitespace and punctuation
  • result in these terms:
    'quick', 'brown', 'fox', 'jumped','over', 'lazy', 'dog'

If you then do this search:
{ "query_string": { "query": "QUICK dOg"}}

Then the default analyzer would analyze your query string and return the
following terms: "quick", "dog"

It then does a 'term' query for each of those terms and combines the
results.

If you did this search:
{ "wildcard": {"text": "o}}

Then it would first look at all terms, and find only those terms that
match that pattern, ie: 'brown', 'fox', 'over', 'dog'.

It then does a 'term' query for each of those terms and combines the
results.

So it doesn't make sense to analyze a 'term'. Terms are the result of
analysis. If you need to analyse a search "phrase" then you should use a
"query_string" or "field" query.

For the same reason, you can't sort on an analyzed field because the
original data doesn't exist. It is tokenised and stored as
terms. (unless the field is also stored? - not sure)

The analyzer used to analyze a search phrase is selected in this order:

  • "analyzer" specified in the query DSL, eg:

    { "query_string": { "query": "foo bar", "analyzer": "keyword"}}

  • "search_analyzer" specified in the mapping

  • "analyzer" specified in the mapping

  • the default_search analyzer specified in the index configuration

  • the default analyzer specified in the index configuration

  • the default_search analyzer specified in the node configuration

  • the default analyzer specified in the node configuration

  • the "standard" analyzer

(I think that's right - I may have added a couple in there that don't
actually exist)

Typically, it doesn't make sense to use a different analyzer at index
and search time, because you may end up searching for terms that don't
actually exist.

If a field is set to be 'not_analyzed', then the whole value is treated
as a term, so "ABC" and "abc" are different, and "abc" will not match
"abc def".

hope this helps

Clint

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.


(system) #10