Specifying the search analyzer for a multi_match query


(Igal @ getRailo.org) #1

hi,

I've defined a custom analyzer that does stemming (kstem) and synonym
expansion, called "synonym_analyzer".

I've set it as the default analyzer for the index, i.e.
settings.analysis.analyzer.default.type: "synonym_analyzer"

I've also set it explicitly as the analyzer for the "title" and
"description" fields, which are the fields that I am searching on.

now I'm running a multi_match query on the "title" and "description"
fields, like so:

GET /_search
{
"query": {
"multi_match" : {
"query" : "red widgets"
,"fields" : [ "title^3", "description" ]
,"operator": "and"
}
}
}

I expect the search results for "red widget" and "red widgets" to be
identical, since the "synonym_analyzer" does stemming, but instead I get
slightly different results (2 more results on the singular term vs. the
plural term).

testing the analyzer with

GET /myindex/_analyzer?text=Widgets returns "widgets"

 while

GET /myindex/_analyzer?text=Widgets&analyzer=synonym_analyzer returns
"widget"

so it looks like the default analyzer is not the synonym_analyzer as I
expect it to be. what am I doing wrong? or how can I specify the
analyzer to use in the query so that the search terms are stemmed?

TIA,

Igal

--
Igal Sapir
Railo Core Developer
http://getRailo.org/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/52B68541.9010401%40getrailo.org.
For more options, visit https://groups.google.com/groups/opt_out.


(Itamar Syn-Hershko) #2

Obviously it has something to do with the text you are indexing and the
shape of the real queries you are using. You have 2 options - either use
Explain to get back an explanation for why this happens and figure out what
happens from there, or isolate this to a unit test and go from there.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Dec 22, 2013 at 8:22 AM, Igal @ getRailo.org igal@getrailo.orgwrote:

hi,

I've defined a custom analyzer that does stemming (kstem) and synonym
expansion, called "synonym_analyzer".

I've set it as the default analyzer for the index, i.e.
settings.analysis.analyzer.default.type: "synonym_analyzer"

I've also set it explicitly as the analyzer for the "title" and
"description" fields, which are the fields that I am searching on.

now I'm running a multi_match query on the "title" and "description"
fields, like so:

GET /_search
{
"query": {
"multi_match" : {
"query" : "red widgets"
,"fields" : [ "title^3", "description" ]
,"operator": "and"
}
}
}

I expect the search results for "red widget" and "red widgets" to be
identical, since the "synonym_analyzer" does stemming, but instead I get
slightly different results (2 more results on the singular term vs. the
plural term).

testing the analyzer with

GET /myindex/_analyzer?text=Widgets returns "widgets"

while

GET /myindex/_analyzer?text=Widgets&analyzer=synonym_analyzer returns
"widget"

so it looks like the default analyzer is not the synonym_analyzer as I
expect it to be. what am I doing wrong? or how can I specify the analyzer
to use in the query so that the search terms are stemmed?

TIA,

Igal

--
Igal Sapir
Railo Core Developer
http://getRailo.org/

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/52B68541.9010401%40getrailo.org.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt3mEugSgh04xGnqi6V9sHfo1EM%3D%2Bi6k0P6%2BMjgQz2Vsg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Igal @ getRailo.org) #3

Thanks Itamar, I see the problem now. Apparently when I tried to create
the index it was failing and I wasn't seeing the error message.

What I tried to do is set a custom analyzer as the default by only
specifying the name, e.g.

{
"settings" : {
"analysis" : {
"analyzer" : {

     "default": { "type": "*synonym_analyzer*" }

    ,"*synonym_analyzer*": {

      "type"       : "custom",
      "tokenizer"  : "whitespace",
      "filter"     : [ "standard", "lowercase", "english_stemmer", 

"synonyms_site" ]
}
}

apparently duplicating the settings from synonym_analyzer to default works.
I was under the impression that I would not need to duplicate the settings.

Igal

On Sunday, December 22, 2013 1:25:50 AM UTC-8, Itamar Syn-Hershko wrote:

Obviously it has something to do with the text you are indexing and the
shape of the real queries you are using. You have 2 options - either use
Explain to get back an explanation for why this happens and figure out what
happens from there, or isolate this to a unit test and go from there.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Dec 22, 2013 at 8:22 AM, Igal @ getRailo.org <ig...@getrailo.org<javascript:>

wrote:

hi,

I've defined a custom analyzer that does stemming (kstem) and synonym
expansion, called "synonym_analyzer".

I've set it as the default analyzer for the index, i.e.
settings.analysis.analyzer.default.type: "synonym_analyzer"

I've also set it explicitly as the analyzer for the "title" and
"description" fields, which are the fields that I am searching on.

now I'm running a multi_match query on the "title" and "description"
fields, like so:

GET /_search
{
"query": {
"multi_match" : {
"query" : "red widgets"
,"fields" : [ "title^3", "description" ]
,"operator": "and"
}
}
}

I expect the search results for "red widget" and "red widgets" to be
identical, since the "synonym_analyzer" does stemming, but instead I get
slightly different results (2 more results on the singular term vs. the
plural term).

testing the analyzer with

GET /myindex/_analyzer?text=Widgets returns "widgets"

while

GET /myindex/_analyzer?text=Widgets&analyzer=synonym_analyzer returns
"widget"

so it looks like the default analyzer is not the synonym_analyzer as I
expect it to be. what am I doing wrong? or how can I specify the analyzer
to use in the query so that the search terms are stemmed?

TIA,

Igal

--
Igal Sapir
Railo Core Developer
http://getRailo.org/

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/52B68541.9010401%40getrailo.org.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b3ac211-e268-458c-9794-29bc33c7cf72%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4