Elasticsearch Reverse Suggester Problem


(gkwelding) #1

Hi guys, I'm hoping somebody on here can help me, I feel like I'm just
missing something really basic but I can't for the life of me figure out
what... I have the following index set up (it's very cut down for clarity's
sake):

{
"index":"products",
"body":{
"settings":{
"number_of_shards":5,
"number_of_replicas":1,
"analysis":{
"analyzer":{
"default":{
"type":"snowball",
"language":"English"
},
"reverse":{
"type":"custom",
"language":"English",
"tokenizer":"standard",

"filter":["standard","lowercase","stop","snowball","reverse"]
}
}
}
},
"mappings":{
"product":{
"properties":{
"_all":{"enabled":true},

"id":{"type":"string","include_in_all":true,"index":"analyzed","analyzer":"snowball","store":"yes"},

"name":{"type":"string","include_in_all":true,"index":"analyzed","analyzer":"snowball","store":"yes"},

"name_reverse":{"type":"string","include_in_all":true,"index":"analyzed","analyzer":"reverse","store":"yes"}
}
}
}
}
}

I'm then running search queries against this and I'm now trying to do
suggesters. The following query works fine and returns no "suggests" as
expected:

{
"index":"products",
"type":"product",
"body":{
"indices_boost":{"id":2,"name":1.5},
"query":{
"filtered":{
"query":{
"query_string":{
"query":"pushchair",
"fields":["id","name"]
}
}
}
},
"suggest":{
"text":"pushchair",
"simple_phrase":{
"phrase":{
"field":"name",
"size":4,
"real_word_error_likelihood":0.95,
"confidence":1,
"gram_size":1,
"direct_generator":[
{
"field":"name",
"suggest_mode":"always",
"min_word_len":1
}
}
}
}
}
}

The next query also works fine and returns the expected suggestions:

{
"index":"products",
"type":"product",
"body":{
"indices_boost":{"id":2,"name":1.5},
"query":{
"filtered":{
"query":{
"query_string":{
"query":"pushchiar",
"fields":["id","name"]
}
}
}
},
"suggest":{
"text":"pushchair",
"simple_phrase":{
"phrase":{
"field":"name",
"size":4,
"real_word_error_likelihood":0.95,
"confidence":1,
"gram_size":1,
"direct_generator":[
{
"field":"name",
"suggest_mode":"always",
"min_word_len":1
}
}
}
}
}
}

As you can see, "pushchair" is spelt incorrectly and then response from
Elasticsearch provides the correct suggestion. The problem comes when I try
to add in reverse support as follows:

{
"index":"products",
"type":"product",
"body":{
"indices_boost":{"id":2,"name":1.5},
"query":{
"filtered":{
"query":{
"query_string":{
"query":"pushchair",
"fields":["id","name"]
}
}
}
},
"suggest":{
"text":"pushchair",
"simple_phrase":{
"phrase":{
"field":"name",
"size":4,
"real_word_error_likelihood":0.95,
"confidence":1,
"gram_size":1,
"direct_generator":[
{
"field":"name",
"suggest_mode":"always",
"min_word_len":1
},{
"field":"name_reverse",
"suggest_mode":"always",
"min_word_len":1,
"pre_filter":"reverse",
"post_filter":"reverse"
}
]
}
}
}
}
}

Now I start hitting problems. A query for "pushchair" returns results and
no suggestions (as expected), a query for "pushchiar" returns no results
and a suggestion to use "pushchair" instead. The problem is when querying
"upshchair", I get 0 results and 0 suggestions... My understanding of
providing a reversed index and reverse filters on the suggester was that it
would then reverse match on "riahchspu" and "riahchsup" and return
"pushchair" as a suggestion.

I can also see that the reverse analyzer's working because when I hit
localhost:9200/searchable/_analyze?analyzer=reverse&text=pushchair I get
the following response:

{"tokens":[{"token":"riahchsup","start_offset":0,"end_offset":9,"type":"","position":1}]}

Any help would be much appreciated.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/31d30976-402d-47e7-8dd6-8ec2fc1ef5a6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(gkwelding) #2

Argh, knew I'd forget something, the blooming ES version number! I'm on the
latest 1.0 version.

On Monday, February 24, 2014 11:39:00 AM UTC, Garry Welding wrote:

Hi guys, I'm hoping somebody on here can help me, I feel like I'm just
missing something really basic but I can't for the life of me figure out
what... I have the following index set up (it's very cut down for clarity's
sake):

{
"index":"products",
"body":{
"settings":{
"number_of_shards":5,
"number_of_replicas":1,
"analysis":{
"analyzer":{
"default":{
"type":"snowball",
"language":"English"
},
"reverse":{
"type":"custom",
"language":"English",
"tokenizer":"standard",

"filter":["standard","lowercase","stop","snowball","reverse"]
}
}
}
},
"mappings":{
"product":{
"properties":{
"_all":{"enabled":true},

"id":{"type":"string","include_in_all":true,"index":"analyzed","analyzer":"snowball","store":"yes"},

"name":{"type":"string","include_in_all":true,"index":"analyzed","analyzer":"snowball","store":"yes"},

"name_reverse":{"type":"string","include_in_all":true,"index":"analyzed","analyzer":"reverse","store":"yes"}
}
}
}
}
}

I'm then running search queries against this and I'm now trying to do
suggesters. The following query works fine and returns no "suggests" as
expected:

{
"index":"products",
"type":"product",
"body":{
"indices_boost":{"id":2,"name":1.5},
"query":{
"filtered":{
"query":{
"query_string":{
"query":"pushchair",
"fields":["id","name"]
}
}
}
},
"suggest":{
"text":"pushchair",
"simple_phrase":{
"phrase":{
"field":"name",
"size":4,
"real_word_error_likelihood":0.95,
"confidence":1,
"gram_size":1,
"direct_generator":[
{
"field":"name",
"suggest_mode":"always",
"min_word_len":1
}
}
}
}
}
}

The next query also works fine and returns the expected suggestions:

{
"index":"products",
"type":"product",
"body":{
"indices_boost":{"id":2,"name":1.5},
"query":{
"filtered":{
"query":{
"query_string":{
"query":"pushchiar",
"fields":["id","name"]
}
}
}
},
"suggest":{
"text":"pushchair",
"simple_phrase":{
"phrase":{
"field":"name",
"size":4,
"real_word_error_likelihood":0.95,
"confidence":1,
"gram_size":1,
"direct_generator":[
{
"field":"name",
"suggest_mode":"always",
"min_word_len":1
}
}
}
}
}
}

As you can see, "pushchair" is spelt incorrectly and then response from
Elasticsearch provides the correct suggestion. The problem comes when I try
to add in reverse support as follows:

{
"index":"products",
"type":"product",
"body":{
"indices_boost":{"id":2,"name":1.5},
"query":{
"filtered":{
"query":{
"query_string":{
"query":"pushchair",
"fields":["id","name"]
}
}
}
},
"suggest":{
"text":"pushchair",
"simple_phrase":{
"phrase":{
"field":"name",
"size":4,
"real_word_error_likelihood":0.95,
"confidence":1,
"gram_size":1,
"direct_generator":[
{
"field":"name",
"suggest_mode":"always",
"min_word_len":1
},{
"field":"name_reverse",
"suggest_mode":"always",
"min_word_len":1,
"pre_filter":"reverse",
"post_filter":"reverse"
}
]
}
}
}
}
}

Now I start hitting problems. A query for "pushchair" returns results and
no suggestions (as expected), a query for "pushchiar" returns no results
and a suggestion to use "pushchair" instead. The problem is when querying
"upshchair", I get 0 results and 0 suggestions... My understanding of
providing a reversed index and reverse filters on the suggester was that it
would then reverse match on "riahchspu" and "riahchsup" and return
"pushchair" as a suggestion.

I can also see that the reverse analyzer's working because when I hit
localhost:9200/searchable/_analyze?analyzer=reverse&text=pushchair I get
the following response:

{"tokens":[{"token":"riahchsup","start_offset":0,"end_offset":9,"type":"","position":1}]}

Any help would be much appreciated.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/187ad946-d400-460e-b3f3-d0e13468a434%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(gkwelding) #3

Really, nobody has an answer to this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/35deb787-48f5-4d22-82fc-5d4cc178d87f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #4

I believe the job of the reverse filter is to efficiently provide
suggestions that share a suffix with the provided term rather than a
prefix. You might try removing the pre_filter to see if it handles
reversed words.

The reason for the reverse index for the suffix is that lucene stores terms
in sorted order and the suggester requires there to be a prefix match to
slice the portion of the index that must be scanned for terms.

Nik

On Tue, Feb 25, 2014 at 2:28 PM, Garry Welding gkwelding@gmail.com wrote:

Really, nobody has an answer to this?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/35deb787-48f5-4d22-82fc-5d4cc178d87f%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3-ZLq%2BKMiWo4LZQQpGDYGq9%3Dan74jKY1kCQ%3D1vRKWGrg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(gkwelding) #5

Hi Nik, thanks for the suggestion. That's why I'm using the pre and post
filters as I want to match the suffix of "upshchair" because I understand
how Lucene stores terms. As such I have set up a new property called
name_reverse that stores the product name as reversed tokens. I'm then
trying to do the 2nd suggester against that and reverse the query passed to
it by using the reverse pre/post filters to do matching on the suffix
instead of the prefix.

On Tuesday, February 25, 2014 8:00:14 PM UTC, Nikolas Everett wrote:

I believe the job of the reverse filter is to efficiently provide
suggestions that share a suffix with the provided term rather than a
prefix. You might try removing the pre_filter to see if it handles
reversed words.

The reason for the reverse index for the suffix is that lucene stores
terms in sorted order and the suggester requires there to be a prefix match
to slice the portion of the index that must be scanned for terms.

Nik

On Tue, Feb 25, 2014 at 2:28 PM, Garry Welding <gkwe...@gmail.com<javascript:>

wrote:

Really, nobody has an answer to this?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/35deb787-48f5-4d22-82fc-5d4cc178d87f%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac1b1bee-39c2-4412-8a56-7d32a25bdd2a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(gkwelding) #6

However, I did give it a try removing the pre filter, but it didn't change
the results.

On Tuesday, February 25, 2014 8:00:14 PM UTC, Nikolas Everett wrote:

I believe the job of the reverse filter is to efficiently provide
suggestions that share a suffix with the provided term rather than a
prefix. You might try removing the pre_filter to see if it handles
reversed words.

The reason for the reverse index for the suffix is that lucene stores
terms in sorted order and the suggester requires there to be a prefix match
to slice the portion of the index that must be scanned for terms.

Nik

On Tue, Feb 25, 2014 at 2:28 PM, Garry Welding <gkwe...@gmail.com<javascript:>

wrote:

Really, nobody has an answer to this?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/35deb787-48f5-4d22-82fc-5d4cc178d87f%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6171d924-678f-40e6-ae9b-08694e14fde1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #7

I'm not sure what to do then. I only use the phrase suggesting in forwards
mode and only know the theory behind the reverse stuff.

Nik

On Wed, Feb 26, 2014 at 3:55 AM, Garry Welding gkwelding@gmail.com wrote:

However, I did give it a try removing the pre filter, but it didn't change
the results.

On Tuesday, February 25, 2014 8:00:14 PM UTC, Nikolas Everett wrote:

I believe the job of the reverse filter is to efficiently provide
suggestions that share a suffix with the provided term rather than a
prefix. You might try removing the pre_filter to see if it handles
reversed words.

The reason for the reverse index for the suffix is that lucene stores
terms in sorted order and the suggester requires there to be a prefix match
to slice the portion of the index that must be scanned for terms.

Nik

On Tue, Feb 25, 2014 at 2:28 PM, Garry Welding gkwe...@gmail.com wrote:

Really, nobody has an answer to this?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/35deb787-48f5-4d22-82fc-5d4cc178d87f%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6171d924-678f-40e6-ae9b-08694e14fde1%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3VmZeg%3D2jgJfhp96Aa%2BgYhCowEM3CB0VDmxjOJiv8oYw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(gkwelding) #8

Just a little bump for this as I still haven't gotten to the bottom of it.
If anyone with the answer wants the rep on SO you can answer here
(http://stackoverflow.com/questions/21989644/having-an-issue-with-elasticsearch-reverse-suggesters)
instead...

On Wednesday, February 26, 2014 2:28:14 PM UTC, Nikolas Everett wrote:

I'm not sure what to do then. I only use the phrase suggesting in
forwards mode and only know the theory behind the reverse stuff.

Nik

On Wed, Feb 26, 2014 at 3:55 AM, Garry Welding <gkwe...@gmail.com<javascript:>

wrote:

However, I did give it a try removing the pre filter, but it didn't
change the results.

On Tuesday, February 25, 2014 8:00:14 PM UTC, Nikolas Everett wrote:

I believe the job of the reverse filter is to efficiently provide
suggestions that share a suffix with the provided term rather than a
prefix. You might try removing the pre_filter to see if it handles
reversed words.

The reason for the reverse index for the suffix is that lucene stores
terms in sorted order and the suggester requires there to be a prefix match
to slice the portion of the index that must be scanned for terms.

Nik

On Tue, Feb 25, 2014 at 2:28 PM, Garry Welding gkwe...@gmail.comwrote:

Really, nobody has an answer to this?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/35deb787-48f5-4d22-82fc-5d4cc178d87f%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6171d924-678f-40e6-ae9b-08694e14fde1%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/72bebdf3-4159-4ff4-a8de-a5dcb985c793%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(gkwelding) #9

I'm marking this as resolved... What did I do to fix it? No idea... It just
started magically working... Which means I was doing something wrong when
testing this functionality...

On Monday, March 3, 2014 2:13:01 PM UTC, Garry Welding wrote:

Just a little bump for this as I still haven't gotten to the bottom of it.
If anyone with the answer wants the rep on SO you can answer here (
http://stackoverflow.com/questions/21989644/having-an-issue-with-elasticsearch-reverse-suggesters)
instead...

On Wednesday, February 26, 2014 2:28:14 PM UTC, Nikolas Everett wrote:

I'm not sure what to do then. I only use the phrase suggesting in
forwards mode and only know the theory behind the reverse stuff.

Nik

On Wed, Feb 26, 2014 at 3:55 AM, Garry Welding gkwe...@gmail.com wrote:

However, I did give it a try removing the pre filter, but it didn't
change the results.

On Tuesday, February 25, 2014 8:00:14 PM UTC, Nikolas Everett wrote:

I believe the job of the reverse filter is to efficiently provide
suggestions that share a suffix with the provided term rather than a
prefix. You might try removing the pre_filter to see if it handles
reversed words.

The reason for the reverse index for the suffix is that lucene stores
terms in sorted order and the suggester requires there to be a prefix match
to slice the portion of the index that must be scanned for terms.

Nik

On Tue, Feb 25, 2014 at 2:28 PM, Garry Welding gkwe...@gmail.comwrote:

Really, nobody has an answer to this?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/35deb787-48f5-4d22-82fc-5d4cc178d87f%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6171d924-678f-40e6-ae9b-08694e14fde1%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7ce2b243-1ff3-49e9-a297-1e5daff8ed1d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #10