Completion Suggester and Analyzer


(Paweł Młynarczyk) #1

Hello

I'm trying out the new completion suggester feature.
I'm using simple analyzer to analyze at both index and search time. I have
"nirvana nevermind" as input for completion and still starting completion
term with "never" does not return anything. I've expected this to work
since analyzer splits "nirvana nevermind" into two separate tokens?

I'm using example data from elasticsearch website:

curl -X PUT localhost:9200/music
curl -X PUT localhost:9200/music/song/_mapping -d '{
"song" : {
"properties" : {
"name" : { "type" : "string" },
"suggest" : { "type" : "completion",
"index_analyzer" : "simple",
"search_analyzer" : "simple",
"payloads" : true
}
}
}
}'

but I've changed the indexed item a bit:

curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
"name" : "Nevermind",
"suggest" : {
"input": [ "nirvana nevermind" ],
"output": "Nirvana - Nevermind"
}
}'

And this query doesn't return anything:

curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
"song-suggest" : {
"text" : "never",
"completion" : {
"field" : "suggest"
}
}
}'

I know I can handle this by just adding more inputs, but I am concerned
about the size of the index, when the list of possible user inputs for an
item goes huge...

Is there a way to analyze terms to match my expectations?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey Pawel,

right now the suggester is a pure prefix suggester, this means the term you
indexed was "Nirvana - Nevermind", so you only get suggestions back, when
you enter "Nirv". So as a workaround you could index several inputs like
"Nirvana" and "Nevermind". So

"input": [ "Nirvana", "Nevermind" ],
"output" : "Nirvana - Nevermind"

would make your usecase work. Also in case you are afraid of the size, you
can easily monitor by field using the nodes stats API, see more at:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-stats.html

From a long term point of view it makes sense to support the
AnalyzingInfixSuggester from Lucene as well.

Hope this helps.

--Alex

On Mon, Oct 7, 2013 at 3:23 PM, Paweł Młynarczyk zwarios@gmail.com wrote:

Hello

I'm trying out the new completion suggester feature.
I'm using simple analyzer to analyze at both index and search time. I have
"nirvana nevermind" as input for completion and still starting completion
term with "never" does not return anything. I've expected this to work
since analyzer splits "nirvana nevermind" into two separate tokens?

I'm using example data from elasticsearch website:

curl -X PUT localhost:9200/music
curl -X PUT localhost:9200/music/song/_mapping -d '{
"song" : {
"properties" : {
"name" : { "type" : "string" },
"suggest" : { "type" : "completion",
"index_analyzer" : "simple",
"search_analyzer" : "simple",
"payloads" : true
}
}
}
}'

but I've changed the indexed item a bit:

curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
"name" : "Nevermind",
"suggest" : {
"input": [ "nirvana nevermind" ],
"output": "Nirvana - Nevermind"
}
}'

And this query doesn't return anything:

curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
"song-suggest" : {
"text" : "never",
"completion" : {
"field" : "suggest"
}
}
}'

I know I can handle this by just adding more inputs, but I am concerned
about the size of the index, when the list of possible user inputs for an
item goes huge...

Is there a way to analyze terms to match my expectations?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(simonw-2) #3

This suggester is in-fact a prefix suggester. it will only operate on the
prefixes you are adding and it will complete them.
You said you are afraid of the size of the index - I can assure you this
one takes you extremely far without being an issue. The compression for
this kind stuff is immense and I have personal experience with the exact
same problems. Don't worry too much about the index size here unless you
have tens of billions of records with many different prefixes. If you have
stuff like <song_name> you can easily have the combinations
[ -<song_name>, <song_name>, <song_name>-] without issues.
We are working on solutions that help with these situations but they won't
use less space.

simon

On Monday, October 7, 2013 3:23:39 PM UTC+2, Paweł Młynarczyk wrote:

Hello

I'm trying out the new completion suggester feature.
I'm using simple analyzer to analyze at both index and search time. I have
"nirvana nevermind" as input for completion and still starting completion
term with "never" does not return anything. I've expected this to work
since analyzer splits "nirvana nevermind" into two separate tokens?

I'm using example data from elasticsearch website:

curl -X PUT localhost:9200/music
curl -X PUT localhost:9200/music/song/_mapping -d '{
"song" : {
"properties" : {
"name" : { "type" : "string" },
"suggest" : { "type" : "completion",
"index_analyzer" : "simple",
"search_analyzer" : "simple",
"payloads" : true
}
}
}
}'

but I've changed the indexed item a bit:

curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
"name" : "Nevermind",
"suggest" : {
"input": [ "nirvana nevermind" ],
"output": "Nirvana - Nevermind"
}
}'

And this query doesn't return anything:

curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
"song-suggest" : {
"text" : "never",
"completion" : {
"field" : "suggest"
}
}
}'

I know I can handle this by just adding more inputs, but I am concerned
about the size of the index, when the list of possible user inputs for an
item goes huge...

Is there a way to analyze terms to match my expectations?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4