Proximity and "OR" or equivalency searching

ronchalant · August 21, 2015, 2:46pm

We're migrating from Oracle's text search using CONTAINS to Elastic, and one search commonly performed by our "power users" is a NEAR (proximity) search in conjunction with an EQUIValence (=) operator.

Equivalence operators basically allows users to define "synonyms" at runtime, which acts sort of like an 'OR' but allows the whole expression to be treated as a single term. So something like:

apple=orange=fruit

Would match on all three terms equally.

By itself it's not much more than an OR, but within a proximity (NEAR) search in Oracle you can do things like:

NEAR((apple=orange=fruit, smoothie=shake=milkshake), 3)

and the above would do a proximity search where the terms apple, orange OR fruit appear within a span of three terms of smoothie, shake or milkshake.

So far, if I use query string query syntax, the only way I could see to search the above would be something like:

"apple smoothie"~3 "orange smoothie"~3 "fruit smoothie"~3 "apple shake"~3 ... etc.

If we employed a synonym token filter for the above I assume (?) it could work the same, but is there any way this can be done at runtime? if not, perhaps it's something worthwhile to add?

polyfractal · August 21, 2015, 3:20pm

Yep! You can index your data using one analyzer, then search it using a different analyzer. One of the best use-cases for that functionality is synonyms because most people don't want to actually index the synonyms, just match them at query time.

Here is a quick demo I whipped up. Basically you create an index with a synonym analyzer, index docs regularly, then use a match-phrase query with slop of 3 and the new synonym analyzer.

jpountz · August 21, 2015, 3:21pm

I don't think you can solve this issue with the query_string syntax. You would have to use the query dsl and the span_or and span_near queries.

ronchalant · August 21, 2015, 3:22pm

awesome thanks man! I'll check it out

polyfractal · August 21, 2015, 3:25pm

Ah, @jpountz brings up a good point I forgot about: phrase matching with slop doesn't guarantee ordering. So "smoothie fruit" is just as likely to match as "fruit smoothie", since phrase matching just checks for number of edits.

If you need order-dependent, sloppy phrase matching (eg. "fruit" must come before "smoothie") you'll probably need the span family like he mentioned. Or you could index 3-word shingles and search those instead.

ronchalant · August 21, 2015, 3:29pm

Order generally doesn't matter for our purposes, and we've got a 1-3 shingle setup for some of the relevant fields.

Part of this is getting our users to use the system more appropriately. I think our power-users will find that there are better ways to actually get at the data they want. The NEAR + EQUIV searches they've been doing with the Oracle search is, IMHO, more of a workaround of the limitations of the indexing therein.

Either way this is informative, thanks all!

Topic		Replies	Views
Unexpected Behavior of OR Match Query With Synonym Graph Elasticsearch	4	190	November 1, 2023
Proximity in Query Elasticsearch	3	453	July 6, 2017
Combinining proximity search and other operators for complex entities Elasticsearch	1	534	November 5, 2018
Operands priorities in Elasticsearch query Elasticsearch	7	2745	July 5, 2017
Strange behavior with match query and synonyms Elasticsearch	6	499	July 6, 2017

Proximity and "OR" or equivalency searching

Related topics