Hi! Im very new to the ES and while learning and playing around with it, I got stuck with a problem that i'm not sure how to solve.
REQUIREMENT
I'm trying to build search-as-you-type autocomplete. I have a table with one column “name”.
Here is a list of example documents I have:
- O/Purist Tsipouro
- Southside
- South Side
- Bénédictine
- Piña Colada
- Bee's Knees
- Beer
- 49th Street
- 49 Warriors
- 3 Dolla
Few requirements on how I want it to work.
Input - Expected output examples
- [“O/P”, “o / pu”, “o/p”] -> [“O/Purist Tsipouro”, … ]
- [“Southside”] -> [“Southside”, “South side”, … ]
- [“South side”] -> [“South side”, “Southside”, … ]
- ["benedict", "ben”, “Bénédictine”] -> [“Bénédictine”, … ]
- [“Bee’s”, “bees”, ] -> [“Bee's Knees”]
CURRENT SETUP
Here is my setting:
settings: {
analysis: {
char_filter: {
"my_char_filter": {
type: "mapping",
mappings: [
"' => ",
"’ => ",
]
}
},
filter: {
"my_word_delimiter": {
type: "word_delimiter",
}
},
analyzer: {
"autocomplete": {
type: "custom",
tokenizer: "autocomplete_tokenizer",
char_filter: ["my_char_filter"],
filter: ["lowercase", "asciifolding", "my_word_delimiter"],
},
},
tokenizer: {
"autocomplete_tokenizer": {
type: "edge_ngram",
min_gram: 1,
max_gram: 20,
token_chars: ["letter", "digit"],
}
}
},
},
mappings: {
properties: {
name: { type: "text", analyzer: "autocomplete" },
}
}
Here is my query:
query: {
match: {
name: {
query: search_query,
analyzer: "autocomplete",
boost: 1
}
}
}
PROBLEM
It works great until I index anything with digits, like “3 dolla”, “49th street”.
If I do, im getting this error:
{"type"=>"illegal_argument_exception", "reason"=>"startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=2,lastStartOffset=2 for field 'name'"} on item with id …
If I understand correctly, its the problem with edge_ngram tokenizer. I’ve tried to move it to the filter and it resolves the error but then the quality of the search results is simply terrible.
I would appreciate if someone could help me fix this or point to the right direction in docs/google.
Im super lost and stuck.
Thank you