Can ES Ignore the stemmer filter when the query is a phrase?

Brett_Anderson · April 11, 2012, 4:48am

My use case relates to the percolator function in ES, but I imagine it's
just as valid for traditional document indexing.

If I set up a percolator for the query: ""empire"", i.e. empire with
quotations around it, I get matches back for documents that have the word
'empired'. For queries without quotations I need matches returned for the
plural forms so I can't remove the stemmer all together.

At the moment the only way I can theoretically achieve what I want is to
setup the percolators using different analyzers depending on whether I want
to match plurals or not, identified by the presence of quotations in the
query. I would then need to percolate two copies of every document, one
using a stemmer and one without. This will half the performance and also
doesn't allow for queries like: ""empire" AND fight", which would match
only the singular for empire but plural forms for fight. Is there a nicer
way to achieve the desired result? Thanks.

kimchy · April 11, 2012, 12:31pm

Not sure that I completely follows the problem, can you gist a recreation
of what you get (Elasticsearch Platform — Find real-time answers at scale | Elastic), it would speed things
up.

On Wed, Apr 11, 2012 at 7:48 AM, Brett Anderson <
brett.anderson.ftw@gmail.com> wrote:

My use case relates to the percolator function in ES, but I imagine it's
just as valid for traditional document indexing.

If I set up a percolator for the query: ""empire"", i.e. empire with
quotations around it, I get matches back for documents that have the word
'empired'. For queries without quotations I need matches returned for the
plural forms so I can't remove the stemmer all together.

At the moment the only way I can theoretically achieve what I want is to
setup the percolators using different analyzers depending on whether I want
to match plurals or not, identified by the presence of quotations in the
query. I would then need to percolate two copies of every document, one
using a stemmer and one without. This will half the performance and also
doesn't allow for queries like: ""empire" AND fight", which would match
only the singular for empire but plural forms for fight. Is there a nicer
way to achieve the desired result? Thanks.

Brett_Anderson · May 9, 2012, 6:53am

I've just created a gist script here: https://gist.github.com/2642394

I setup a fresh index with the snowball stemmer. I then create a percolator
for the term "empire" (with the quotes). I then percolate a document with
the text 'empire', which correctly matches. I then percolate another
document with the text 'empires' and again the percolator matches. This
second example is matching a stemmed version of the original percolator,
however I was hoping that it wouldn't match since the percolator had the
search term in quotations, indicating the need for an exact match.

If you search Google for 'car' you will get matches for 'cars', however if
you search for "car" (with quotes) you will only get matches for 'car', not
the plural form. I was hoping to get this natural language functionality
out of the box with ES. I'm pretty sure Lucene doesn't natively support
this so it's a pretty tall order. As I said previously I can create two
sets of percolators, one with stemming and one without. Then I can
register queries that use quotes with the non-stemmed and all others with
the stemmed, then percolate each document against both sets. This is good
enough for the moment but it would be really great to handle mixed queries,
e.g. '"car" AND fight' matching 'car ... fights', whereby stemming has been
applied to the fight term but not the car term.

I'm just wondering if there is a cleaner way to achieve what I want with
the existing codebase, rather than specifying a feature requests.

Thanks,
LJ.

kimchy · May 10, 2012, 8:32am

I like this feature!, I can see many different use cases for it. I opened
this: Allow to customize quote analyzer to be used when quoting text in a query_string · Issue #1931 · elastic/elasticsearch · GitHub.

Note though, there is a caveat here. Remember that when indexing data, the
analyzer is also applied, so 'empires' indexed will be indexed as 'empire'
(with stemming). And, if you don't do any stemming on it when searching,
"empires" (with non stemming analyzer) will not find anything.

On Wed, May 9, 2012 at 9:53 AM, Laser Jesus brett.anderson.ftw@gmail.comwrote:

I've just created a gist script here: Elasticsearch setup to demonstrate stemming with phrases · GitHub

I setup a fresh index with the snowball stemmer. I then create a
percolator for the term "empire" (with the quotes). I then percolate a
document with the text 'empire', which correctly matches. I then percolate
another document with the text 'empires' and again the percolator matches.
This second example is matching a stemmed version of the original
percolator, however I was hoping that it wouldn't match since the
percolator had the search term in quotations, indicating the need for an
exact match.

If you search Google for 'car' you will get matches for 'cars', however if
you search for "car" (with quotes) you will only get matches for 'car', not
the plural form. I was hoping to get this natural language functionality
out of the box with ES. I'm pretty sure Lucene doesn't natively support
this so it's a pretty tall order. As I said previously I can create two
sets of percolators, one with stemming and one without. Then I can
register queries that use quotes with the non-stemmed and all others with
the stemmed, then percolate each document against both sets. This is good
enough for the moment but it would be really great to handle mixed queries,
e.g. '"car" AND fight' matching 'car ... fights', whereby stemming has been
applied to the fight term but not the car term.

I'm just wondering if there is a cleaner way to achieve what I want with
the existing codebase, rather than specifying a feature requests.

Thanks,
LJ.

kimchy · May 10, 2012, 8:39am

Also note, specifying just a search phrase analyzer for the empire case
will not work properly, because empire is stemmed to empir...

On Thu, May 10, 2012 at 11:32 AM, Shay Banon kimchy@gmail.com wrote:

I like this feature!, I can see many different use cases for it. I opened
this: Allow to customize quote analyzer to be used when quoting text in a query_string · Issue #1931 · elastic/elasticsearch · GitHub.

Note though, there is a caveat here. Remember that when indexing data, the
analyzer is also applied, so 'empires' indexed will be indexed as 'empire'
(with stemming). And, if you don't do any stemming on it when searching,
"empires" (with non stemming analyzer) will not find anything.

On Wed, May 9, 2012 at 9:53 AM, Laser Jesus brett.anderson.ftw@gmail.comwrote:

I've just created a gist script here: Elasticsearch setup to demonstrate stemming with phrases · GitHub

I setup a fresh index with the snowball stemmer. I then create a
percolator for the term "empire" (with the quotes). I then percolate a
document with the text 'empire', which correctly matches. I then percolate
another document with the text 'empires' and again the percolator matches.
This second example is matching a stemmed version of the original
percolator, however I was hoping that it wouldn't match since the
percolator had the search term in quotations, indicating the need for an
exact match.

If you search Google for 'car' you will get matches for 'cars', however
if you search for "car" (with quotes) you will only get matches for 'car',
not the plural form. I was hoping to get this natural language
functionality out of the box with ES. I'm pretty sure Lucene doesn't
natively support this so it's a pretty tall order. As I said previously I
can create two sets of percolators, one with stemming and one without. Then
I can register queries that use quotes with the non-stemmed and all others
with the stemmed, then percolate each document against both sets. This is
good enough for the moment but it would be really great to handle mixed
queries, e.g. '"car" AND fight' matching 'car ... fights', whereby stemming
has been applied to the fight term but not the car term.

I'm just wondering if there is a cleaner way to achieve what I want with
the existing codebase, rather than specifying a feature requests.

Thanks,
LJ.

Topic		Replies	Views
Percolator query without the mapped stemming analyzer Elasticsearch	1	461	July 6, 2017
Unable to filter percolators when stemming is used on the index Elasticsearch	2	310	July 6, 2017
Percolate query alternative for given use case Elasticsearch	1	756	December 6, 2019
Best way to get exact matches in query string searches, while also using a stemmer Elasticsearch	1	393	July 6, 2017
Phrase match on an index analyzed with stemmer Elasticsearch	1	524	June 6, 2017

Can ES Ignore the stemmer filter when the query is a phrase?

Related topics