I'm using ES's auto-completion and i'd like to understand how the prefix
tokenization works. Example queries and results its returning currently:
'blackb' -> 'Blackberry Q10 Red'
--> Expected
'q' -> 'Blackberry Q10 Red'
--> Expected
'q10' -> No result, expected is 'Blackberry Q10 Red'
--> Why are results returned when typing in 'q' but not 'q10'?
'blackberry q10'
--> Expected
'sam' -> 'Samsung Galaxy S5'
--> Expected
'galax' > -> 'Samsung Galaxy S5'
--> Expected
'S5'
-> No result, expected is 'Samsung Galaxy S5'
I'm indexing the document using input: ["blackberry, "Q10 Red"], input:
["samsung", "galaxy s5"], please find the mapping / query below. I thought
the standard tokenizer would also tokenize on whitespaces and hence give
result for S5, also i don't understand why 'q' gives results but 'q10'
doesn't. Can i use the prefix tokenizer for such a use case or would it
need to switch to ngrams completely?
two things: First, the completion suggester uses the simple analyzer by
default. Please use the analyze API with some of your terms and you will
see, why they dont match anymore (especially the q10). You may want to try
out the inquisitor plugin for that, it has a nice web UI for the analyze
API.
Second: The completion suggester is a prefix suggester, so "galaxy S5'
requires you to type "galaxy" first before typing "S5"...
I'm using ES's auto-completion and i'd like to understand how the prefix
tokenization works. Example queries and results its returning currently:
'blackb' -> 'Blackberry Q10 Red'
--> Expected
'q' -> 'Blackberry Q10 Red'
--> Expected
'q10' -> No result, expected is 'Blackberry Q10 Red'
--> Why are results returned when typing in 'q' but not 'q10'?
'blackberry q10'
--> Expected
'sam' -> 'Samsung Galaxy S5'
--> Expected
'galax' > -> 'Samsung Galaxy S5'
--> Expected
'S5'
-> No result, expected is 'Samsung Galaxy S5'
I'm indexing the document using input: ["blackberry, "Q10 Red"], input:
["samsung", "galaxy s5"], please find the mapping / query below. I thought
the standard tokenizer would also tokenize on whitespaces and hence give
result for S5, also i don't understand why 'q' gives results but 'q10'
doesn't. Can i use the prefix tokenizer for such a use case or would it
need to switch to ngrams completely?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.