Better way to do a wildacard search - prefix wildcard or match_phrase_prefix?


(Divya Malini) #1

Hi,
We have a scenario where we need to support queries like "*foo", "*abc" etc(calling them prefix wildcard).

And doing a wildcard search like below, had a great impact on the available memory and impacted the search performance.

Query1
POST index/_search
{
"query" : {
"wildcard" : {
"field1" : "*foo",
"rewrite" : "top_terms_boost_100" // we try to force limit the expansions to 100
}
}

One solution we are currently thinking of ,is to have reverse token filter applied to a field to store the reverse terms and do the above query like -

Query2
POST index/_search
{
"query" : {
"wildcard" : {
"field1" : "oof*",
"rewrite" : "top_terms_boost_100" // we try to force limit the expansions to 100
}
}

I have 2 questions , please help clarify
(1) Is there any way we could better rewrite the Query1 so that we don't have a performance hit in terms of memory consumption etc?

(2) On a field with reverse token applied is it better to use a wildcard term query (like Query2)or a match_phrase_prefix query ? Which one is recommended? Which is more performant?

Thanks, Divya


(David Pilato) #2

IMO it's indeed better to use a reverse token based analyzer in such a case (with lowercase as well).

I'd also use a reverse token, lowercase and edge n gram filter at index time and a reverse token, lowercase at search time.

That will be less performant at index time (as you have to generate more tokens but will be much faster IMO at search time.

You need to test this strategy though.


(Divya Malini) #3

Thanks for the suggestion David!!

Could you also please share any recommendation regarding match_phrase_prefix or wilcard term query, the more performant one at a larger scale ?


(David Pilato) #4

Avoid wildcard as much as possible IMO.


(Divya Malini) #5

Thanks David!


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.