Goal: To seamlessly autocomplete addresses while utilizing synonyms.
In my mapping/index settings I have a filter for synonym and a filter for
edgeNgrams. I then use an index analyzer to utilize the synonym filter and
edgeNgram. Obviously this doesn't work because we tokenize on the whole
string and the string doesn't match a synonym. This is very problematic
because, for example, we want to have a synonym that says "street" = "st".
So the question is, how do I accomplish this? Is there a way to do a
standard tokenization, apply synonyms, then concat the tokens into a single
token before applying the edgeNGram filter? Maybe there is something else?
I have tried to use a standard tokenizer and utilize a match_phrase_prefix
but that gives me issues. Two examples:
- If I type in "500 m" or "500 ma" it will not return the result i'm
looking for. This is because "madison" is far down the expressions list. I
have to go up to around 750 max expressions in order to get this to work
properly
- If I type in "500 madison a" it will return no results. This is because
it can't get to "ave" within it's max expressions. I have to go up to
around 7500 max expressions in order for this to work properly.
And that's just not a reasonable solution for autocomplete.
Synonym Filter:
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/address_syms.txt"
}
EdgeNGram Filter:
"substring": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 50,
"side": "front"
},
Analyzer:
"str_index_analyzer": {
"tokenizer": "keyword",
"char_filter": [
"my_filter"
],
"filter": [
"lowercase",
"synonym",
"substring"
]
}
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Nobody has a solution for using the synonym filter with the keyword
tokenizer? I find this hard to believe. It seems like something that would
be very useful for many users of elasticsearch.
On Wednesday, September 25, 2013 4:26:53 PM UTC-4, Anthony Campagna wrote:
Goal: To seamlessly autocomplete addresses while utilizing synonyms.
In my mapping/index settings I have a filter for synonym and a filter for
edgeNgrams. I then use an index analyzer to utilize the synonym filter and
edgeNgram. Obviously this doesn't work because we tokenize on the whole
string and the string doesn't match a synonym. This is very problematic
because, for example, we want to have a synonym that says "street" = "st".
So the question is, how do I accomplish this? Is there a way to do a
standard tokenization, apply synonyms, then concat the tokens into a single
token before applying the edgeNGram filter? Maybe there is something else?
I have tried to use a standard tokenizer and utilize a match_phrase_prefix
but that gives me issues. Two examples:
- If I type in "500 m" or "500 ma" it will not return the result i'm
looking for. This is because "madison" is far down the expressions list. I
have to go up to around 750 max expressions in order to get this to work
properly
- If I type in "500 madison a" it will return no results. This is because
it can't get to "ave" within it's max expressions. I have to go up to
around 7500 max expressions in order for this to work properly.
And that's just not a reasonable solution for autocomplete.
Synonym Filter:
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/address_syms.txt"
}
EdgeNGram Filter:
"substring": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 50,
"side": "front"
},
Analyzer:
"str_index_analyzer": {
"tokenizer": "keyword",
"char_filter": [
"my_filter"
],
"filter": [
"lowercase",
"synonym",
"substring"
]
}
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.