Synonyms with Keyword Tokenizer


(Anthony Campagna) #1

Goal: To seamlessly autocomplete addresses while utilizing synonyms.

In my mapping/index settings I have a filter for synonym and a filter for
edgeNgrams. I then use an index analyzer to utilize the synonym filter and
edgeNgram. Obviously this doesn't work because we tokenize on the whole
string and the string doesn't match a synonym. This is very problematic
because, for example, we want to have a synonym that says "street" = "st".

So the question is, how do I accomplish this? Is there a way to do a
standard tokenization, apply synonyms, then concat the tokens into a single
token before applying the edgeNGram filter? Maybe there is something else?

I have tried to use a standard tokenizer and utilize a match_phrase_prefix
but that gives me issues. Two examples:

  • If I type in "500 m" or "500 ma" it will not return the result i'm
    looking for. This is because "madison" is far down the expressions list. I
    have to go up to around 750 max expressions in order to get this to work
    properly
  • If I type in "500 madison a" it will return no results. This is because
    it can't get to "ave" within it's max expressions. I have to go up to
    around 7500 max expressions in order for this to work properly.

And that's just not a reasonable solution for autocomplete.

Synonym Filter:
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/address_syms.txt"
}

EdgeNGram Filter:
"substring": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 50,
"side": "front"
},

Analyzer:
"str_index_analyzer": {
"tokenizer": "keyword",
"char_filter": [
"my_filter"
],
"filter": [
"lowercase",
"synonym",
"substring"
]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Anthony Campagna) #2

Nobody has a solution for using the synonym filter with the keyword
tokenizer? I find this hard to believe. It seems like something that would
be very useful for many users of elasticsearch.

On Wednesday, September 25, 2013 4:26:53 PM UTC-4, Anthony Campagna wrote:

Goal: To seamlessly autocomplete addresses while utilizing synonyms.

In my mapping/index settings I have a filter for synonym and a filter for
edgeNgrams. I then use an index analyzer to utilize the synonym filter and
edgeNgram. Obviously this doesn't work because we tokenize on the whole
string and the string doesn't match a synonym. This is very problematic
because, for example, we want to have a synonym that says "street" = "st".

So the question is, how do I accomplish this? Is there a way to do a
standard tokenization, apply synonyms, then concat the tokens into a single
token before applying the edgeNGram filter? Maybe there is something else?

I have tried to use a standard tokenizer and utilize a match_phrase_prefix
but that gives me issues. Two examples:

  • If I type in "500 m" or "500 ma" it will not return the result i'm
    looking for. This is because "madison" is far down the expressions list. I
    have to go up to around 750 max expressions in order to get this to work
    properly
  • If I type in "500 madison a" it will return no results. This is because
    it can't get to "ave" within it's max expressions. I have to go up to
    around 7500 max expressions in order for this to work properly.

And that's just not a reasonable solution for autocomplete.

Synonym Filter:
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/address_syms.txt"
}

EdgeNGram Filter:
"substring": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 50,
"side": "front"
},

Analyzer:
"str_index_analyzer": {
"tokenizer": "keyword",
"char_filter": [
"my_filter"
],
"filter": [
"lowercase",
"synonym",
"substring"
]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3