Using match_phrase_prefix against a filtered/queried subset of my index to reduce max_expressions requirements


(Anthony Campagna) #1

Goal: To seamlessly autocomplete addresses while utilizing synonyms.

I have tried to use a standard tokenizer (so that I can use synonyms for
each word) and utilize a match_phrase_prefix but that gives me issues. Two
examples:

  • If I type in "500 m" or "500 ma" it will not return the result i'm
    looking for. This is because "madison" is far down the expressions list. I
    have to go up to around 750 max expressions in order to get this to work
    properly
  • If I type in "500 madison a" it will return no results. This is because
    it can't get to "ave" within it's max expressions. I have to go up to
    around 7500 max expressions in order for this to work properly.

And that's just not a reasonable solution for autocomplete.

Question: Is there a way to do a filter or preliminary query to get all
results that start with 500. THEN use only the possible matches of that
query for a match_phrase_prefix query? Meaning the demand for max
expressions will be FAR lower.

Maybe there is a different way entirely to do this?
Maybe there is a way to take position into account when calculating the
phrase prefix possibilities?
Maybe each number can be a "type" in my index? Would this mean that the
phrase prefix possibilities would be less?

Synonym Filter:
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/address_syms.txt"
}

Analysis:
{
"str_index_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_filter"
],
"filter": [
"lowercase",
"synonym"
]
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2