Synonyms with Keyword Tokenizer

Anthony_Campagna · September 25, 2013, 8:26pm

Goal: To seamlessly autocomplete addresses while utilizing synonyms.

In my mapping/index settings I have a filter for synonym and a filter for
edgeNgrams. I then use an index analyzer to utilize the synonym filter and
edgeNgram. Obviously this doesn't work because we tokenize on the whole
string and the string doesn't match a synonym. This is very problematic
because, for example, we want to have a synonym that says "street" = "st".

So the question is, how do I accomplish this? Is there a way to do a
standard tokenization, apply synonyms, then concat the tokens into a single
token before applying the edgeNGram filter? Maybe there is something else?

I have tried to use a standard tokenizer and utilize a match_phrase_prefix
but that gives me issues. Two examples:

If I type in "500 m" or "500 ma" it will not return the result i'm
looking for. This is because "madison" is far down the expressions list. I
have to go up to around 750 max expressions in order to get this to work
properly
If I type in "500 madison a" it will return no results. This is because
it can't get to "ave" within it's max expressions. I have to go up to
around 7500 max expressions in order for this to work properly.

And that's just not a reasonable solution for autocomplete.

Synonym Filter:
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/address_syms.txt"
}

EdgeNGram Filter:
"substring": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 50,
"side": "front"
},

Analyzer:
"str_index_analyzer": {
"tokenizer": "keyword",
"char_filter": [
"my_filter"
],
"filter": [
"lowercase",
"synonym",
"substring"
]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Anthony_Campagna · October 4, 2013, 6:59pm

Nobody has a solution for using the synonym filter with the keyword
tokenizer? I find this hard to believe. It seems like something that would
be very useful for many users of elasticsearch.

On Wednesday, September 25, 2013 4:26:53 PM UTC-4, Anthony Campagna wrote:

Goal: To seamlessly autocomplete addresses while utilizing synonyms.

In my mapping/index settings I have a filter for synonym and a filter for
edgeNgrams. I then use an index analyzer to utilize the synonym filter and
edgeNgram. Obviously this doesn't work because we tokenize on the whole
string and the string doesn't match a synonym. This is very problematic
because, for example, we want to have a synonym that says "street" = "st".

So the question is, how do I accomplish this? Is there a way to do a
standard tokenization, apply synonyms, then concat the tokens into a single
token before applying the edgeNGram filter? Maybe there is something else?

I have tried to use a standard tokenizer and utilize a match_phrase_prefix
but that gives me issues. Two examples:

If I type in "500 m" or "500 ma" it will not return the result i'm
looking for. This is because "madison" is far down the expressions list. I
have to go up to around 750 max expressions in order to get this to work
properly

If I type in "500 madison a" it will return no results. This is because
it can't get to "ave" within it's max expressions. I have to go up to
around 7500 max expressions in order for this to work properly.

And that's just not a reasonable solution for autocomplete.

Synonym Filter:
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/address_syms.txt"
}

EdgeNGram Filter:
"substring": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 50,
"side": "front"
},

Analyzer:
"str_index_analyzer": {
"tokenizer": "keyword",
"char_filter": [
"my_filter"
],
"filter": [
"lowercase",
"synonym",
"substring"
]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Synonym_filter and edge_ngram token filter not working together Elasticsearch	3	647	May 2, 2018
Using match_phrase_prefix against a filtered/queried subset of my index to reduce max_expressions requirements Elasticsearch	1	352	July 6, 2017
Synonym token filter and edge_ngram tokenizer conflicts Elasticsearch	6	1497	October 7, 2019
How to and efficient way to combine standard tokenizer with autocomplete (type ahead) functionality Elasticsearch	1	329	July 6, 2017
Help with synonyms and edge ngram analyzers Elasticsearch	2	1912	July 6, 2017

Synonyms with Keyword Tokenizer

Related topics