Reverse keyword vs new Wildcard data type

TheWorkingDeveloper · August 20, 2020, 10:46am

In email wildcard searching:

user_email*
*@domain.com

Would reverse keyword be faster in searching than wildcard given a single wildcard?

moc.niamod@* on reverse keyword
vs
*@domain.com on wildcard
For example.

Solved:
For this specific use case: Reverse keyword is fastest

Mark_Harwood · August 20, 2020, 11:04am

It's all about index and query alignment.
If you build your own index and ensure related queries are similarly treated then you can use an index with reversed tokens efficiently for the *@domain.com search and a different index with non-reversed tokens for making a bill.gates@* search efficient. If you use the wrong query on the wrong index you end up scanning all unique values in the index rather than being able to quickly seek to the relevant parts. There's no safety rails to prevent you picking the wrong field.
That said, if you bothered to create the appropriate index and the searcher picked the right field then it should be quick and there's no need to verify each of the docs that have that term.

With wildcard field it's a more general-purpose index. One index does leading and trailing wildcards (and arbitrarily complex regex) searches. It has ngram index entries for every character position in the string to make it fast. The downside is that matching purely on this ngram index is not sufficient evidence of a match - for each candidate matched doc the engine also needs to verify that the search expression is indeed in that doc by some follow-up testing (this happens automatically behind the scenes). If you have many docs containing the search term we will have to test each of them. Small numbers of matches will be much quicker.

So it depends. It's unlikely to beat a carefully custom-designed combo of index and query but there's more versatility and less need for multiple data structures.

TheWorkingDeveloper · August 20, 2020, 11:21am

So generalizing the answer. A reverse keyword search would be faster given more matching results for the specific instance of searching emails part of a domain?
This is assuming the correct field is searched by the program/website/system/user with the appropriate query.

Mark_Harwood · August 20, 2020, 11:28am

Yes. It would not be faster for that specific use case with that specific indexing.
The plus is that the wildcard field would support fast bill.gates@*, *@domain.com, bill*@domain.*, [b|w]ill.*domain\.(org|com) etc queries too without needing additional indexed fields.

system · September 17, 2020, 11:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Better way to do a wildacard search - prefix wildcard or match_phrase_prefix? Elasticsearch	5	4719	February 20, 2018
Searching emails by domain Elasticsearch	4	4355	January 17, 2017
Filtering for wildcard domains Elasticsearch	4	688	September 8, 2021
Leading wildcard search handling Elasticsearch	3	4286	May 2, 2017
Slow Query Performance Elasticsearch	2	54	October 21, 2024

Reverse keyword vs new Wildcard data type

Related topics