Query analzyer with respect to field/index analzyer


(Nora Olsen) #1

Hi,

My mappings are defined like this:
{
"article" : {
"properties" : {
"text" : {"type" : "string", "store" : "yes",
"analyzer":"snowball"},
"country_name" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"city_name" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"tags": {"type":"string", "store":"no",
"analyzer":"snowball"},
}
}
}

And I have the following query:
"query" : {
"multi_match" : {
"fields" : ["fieldA", "fieldB"],
"query" : query
}
},

I have read that the search query analyzer should match with the
index analyzer but what if one of the properties uses an EdgeNGram?

Also, if I specify "snowball" in my query, the results are flipped. Should
it be since both the query and index are using the same analyzer?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Luca Cavanna) #2

Hi Nora,
correct usually the analyzer used at querytime is similar to the one used
at index time. NGrams are one of the exceptions, which usually get applied
only at index time as you have guessed.
If you want to use a different analyzer at querytime you can either specify
it in the query or in the mapping (search_analyzer).

If you apply stemming, it's usually needed to apply it at both index time
and query time, as you only index the stems and that's what you want to
search for (only the query terms stems).

That said, specifying snowball in your query shouldn't make any difference
as you have the same in the mapping (analyzer means both index_analyzer and
search_analyzer will be the same). Do you have a different score for your
documents or always the same?

On Thursday, September 19, 2013 4:54:58 PM UTC+2, Nora Olsen wrote:

Hi,

My mappings are defined like this:
{
"article" : {
"properties" : {
"text" : {"type" : "string", "store" : "yes",
"analyzer":"snowball"},
"country_name" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"city_name" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"tags": {"type":"string", "store":"no",
"analyzer":"snowball"},
}
}
}

And I have the following query:
"query" : {
"multi_match" : {
"fields" : ["fieldA", "fieldB"],
"query" : query
}
},

I have read that the search query analyzer should match with the
index analyzer but what if one of the properties uses an EdgeNGram?

Also, if I specify "snowball" in my query, the results are flipped. Should
it be since both the query and index are using the same analyzer?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nora Olsen) #3

Looks like is a mistake from me. The results are the same regardless
whether the 'snowball' analyzer is specified in the query.

Is it common to have different analyzers on various fields, except ngrams?

Based on my document:
"properties" : {
"text" : {"type" : "string", "store" : "yes",
"analyzer":"snowball"},
"country_name" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"city_name" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"tags": {"type":"string", "store":"no",
"analyzer":"snowball"},
}

And I'm trying to perform queries without the use NLP POS tagging or Entity
Extraction:
"multi_match" : {
"analyzer": "snowball",
"fields" : ["text", "city_name^1.3",
"country_name^1.2", "tags^1.1"],
"query" : "Hiking trails in Brazil"
}

I am thinking "snowball" should be sufficient?

Thanks!

On Friday, September 20, 2013 12:17:00 AM UTC+8, Luca Cavanna wrote:

Hi Nora,
correct usually the analyzer used at querytime is similar to the one used
at index time. NGrams are one of the exceptions, which usually get applied
only at index time as you have guessed.
If you want to use a different analyzer at querytime you can either
specify it in the query or in the mapping (search_analyzer).

If you apply stemming, it's usually needed to apply it at both index time
and query time, as you only index the stems and that's what you want to
search for (only the query terms stems).

That said, specifying snowball in your query shouldn't make any difference
as you have the same in the mapping (analyzer means both index_analyzer and
search_analyzer will be the same). Do you have a different score for your
documents or always the same?

On Thursday, September 19, 2013 4:54:58 PM UTC+2, Nora Olsen wrote:

Hi,

My mappings are defined like this:
{
"article" : {
"properties" : {
"text" : {"type" : "string", "store" : "yes",
"analyzer":"snowball"},
"country_name" : {"type" : "string", "store"
: "yes", "analyzer":"snowball"},
"city_name" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"tags": {"type":"string", "store":"no",
"analyzer":"snowball"},
}
}
}

And I have the following query:
"query" : {
"multi_match" : {
"fields" : ["fieldA", "fieldB"],
"query" : query
}
},

I have read that the search query analyzer should match with the
index analyzer but what if one of the properties uses an EdgeNGram?

Also, if I specify "snowball" in my query, the results are flipped.
Should it be since both the query and index are using the same analyzer?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Luca Cavanna) #4

Yes it is common to have different analyzers per field. For instance, it
usually doesn't make sense to apply stemming on city or country names,
probably not even titles as you somehow lose information. I would apply
stemming on a long text. Same for stopwords. What's important is
lowercasing for instance, which is something that I've seen applied pretty
much on every field.

I would suggest to play around with analyzer and build your own text
analysis chain, per field, depending on your data and requirements. Take a
look at the analyze
apihttp://www.elasticsearch.org/guide/reference/api/admin-indices-analyze/and
the inquisitor
plugin https://github.com/polyfractal/elasticsearch-inquisitor to see how
the whole thing works.

Cheers
Luca

On Thu, Sep 19, 2013 at 6:55 PM, Nora Olsen nora.olsen77@gmail.com wrote:

Looks like is a mistake from me. The results are the same regardless
whether the 'snowball' analyzer is specified in the query.

Is it common to have different analyzers on various fields, except ngrams?

Based on my document:
"properties" : {
"text" : {"type" : "string", "store" : "yes",
"analyzer":"snowball"},
"country_name" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"city_name" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"tags": {"type":"string", "store":"no",
"analyzer":"snowball"},
}

And I'm trying to perform queries without the use NLP POS tagging or
Entity Extraction:
"multi_match" : {
"analyzer": "snowball",
"fields" : ["text", "city_name^1.3",
"country_name^1.2", "tags^1.1"],
"query" : "Hiking trails in Brazil"
}

I am thinking "snowball" should be sufficient?

Thanks!

On Friday, September 20, 2013 12:17:00 AM UTC+8, Luca Cavanna wrote:

Hi Nora,
correct usually the analyzer used at querytime is similar to the one used
at index time. NGrams are one of the exceptions, which usually get applied
only at index time as you have guessed.
If you want to use a different analyzer at querytime you can either
specify it in the query or in the mapping (search_analyzer).

If you apply stemming, it's usually needed to apply it at both index time
and query time, as you only index the stems and that's what you want to
search for (only the query terms stems).

That said, specifying snowball in your query shouldn't make any
difference as you have the same in the mapping (analyzer means both
index_analyzer and search_analyzer will be the same). Do you have a
different score for your documents or always the same?

On Thursday, September 19, 2013 4:54:58 PM UTC+2, Nora Olsen wrote:

Hi,

My mappings are defined like this:
{
"article" : {
"properties" : {
"text" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"country_name" : {"type" : "string", "store"
: "yes", "analyzer":"snowball"},
"city_name" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"tags": {"type":"string", "store":"no",
"analyzer":"snowball"},
}
}
}

And I have the following query:
"query" : {
"multi_match" : {
"fields" : ["fieldA", "fieldB"],
"query" : query
}
},

I have read that the search query analyzer should match with the
index analyzer but what if one of the properties uses an EdgeNGram?

Also, if I specify "snowball" in my query, the results are flipped.
Should it be since both the query and index are using the same analyzer?

Thanks!

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/am5sX8-3w8U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nora Olsen) #5

Thanks for the pointers. Will read up more.

On Friday, September 20, 2013 1:03:46 AM UTC+8, Luca Cavanna wrote:

Yes it is common to have different analyzers per field. For instance, it
usually doesn't make sense to apply stemming on city or country names,
probably not even titles as you somehow lose information. I would apply
stemming on a long text. Same for stopwords. What's important is
lowercasing for instance, which is something that I've seen applied pretty
much on every field.

I would suggest to play around with analyzer and build your own text
analysis chain, per field, depending on your data and requirements. Take a
look at the analyze apihttp://www.elasticsearch.org/guide/reference/api/admin-indices-analyze/and the inquisitor
plugin https://github.com/polyfractal/elasticsearch-inquisitor to see
how the whole thing works.

Cheers
Luca

On Thu, Sep 19, 2013 at 6:55 PM, Nora Olsen <nora.o...@gmail.com<javascript:>

wrote:

Looks like is a mistake from me. The results are the same regardless
whether the 'snowball' analyzer is specified in the query.

Is it common to have different analyzers on various fields, except
ngrams?

Based on my document:
"properties" : {
"text" : {"type" : "string", "store" : "yes",
"analyzer":"snowball"},
"country_name" : {"type" : "string", "store"
: "yes", "analyzer":"snowball"},
"city_name" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"tags": {"type":"string", "store":"no",
"analyzer":"snowball"},
}

And I'm trying to perform queries without the use NLP POS tagging or
Entity Extraction:
"multi_match" : {
"analyzer": "snowball",
"fields" : ["text", "city_name^1.3",
"country_name^1.2", "tags^1.1"],
"query" : "Hiking trails in Brazil"
}

I am thinking "snowball" should be sufficient?

Thanks!

On Friday, September 20, 2013 12:17:00 AM UTC+8, Luca Cavanna wrote:

Hi Nora,
correct usually the analyzer used at querytime is similar to the one
used at index time. NGrams are one of the exceptions, which usually get
applied only at index time as you have guessed.
If you want to use a different analyzer at querytime you can either
specify it in the query or in the mapping (search_analyzer).

If you apply stemming, it's usually needed to apply it at both index
time and query time, as you only index the stems and that's what you want
to search for (only the query terms stems).

That said, specifying snowball in your query shouldn't make any
difference as you have the same in the mapping (analyzer means both
index_analyzer and search_analyzer will be the same). Do you have a
different score for your documents or always the same?

On Thursday, September 19, 2013 4:54:58 PM UTC+2, Nora Olsen wrote:

Hi,

My mappings are defined like this:
{
"article" : {
"properties" : {
"text" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"country_name" : {"type" : "string",
"store" : "yes", "analyzer":"snowball"},
"city_name" : {"type" : "string", "store" :
"yes", "analyzer":"snowball"},
"tags": {"type":"string", "store":"no",
"analyzer":"snowball"},
}
}
}

And I have the following query:
"query" : {
"multi_match" : {
"fields" : ["fieldA", "fieldB"],
"query" : query
}
},

I have read that the search query analyzer should match with the
index analyzer but what if one of the properties uses an EdgeNGram?

Also, if I specify "snowball" in my query, the results are flipped.
Should it be since both the query and index are using the same analyzer?

Thanks!

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/am5sX8-3w8U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6