Complex Proximity/Nested/exact-match Queries in Elasticsearch

jaivikram · June 23, 2022, 11:33am

I am looking for proximity (and nested proximity as well) between multiple hetrogeneous groups

By hetrogeneous I mean the token in the groups can be:

Some examples of the search criteria from user's perspective are:

((sfo OR “serious fraud office”) w/50 investigat*)
(potential w/30 (violation*))
(fcpa or "foreign corrupt practices act")
((improper W/3 payment*) W/20 investigat*)
((corrupt* W/5 payment*) W/20 investigat*)
((sec OR "securities and exchange commission") w/20 (fcpa OR foreign corrupt practices act))
(brib* W/20 arrest*)
(((fcpa OR "foreign corrupt practices act") w/50 subpoe*) W/20 (violat* OR investigat*))
((internal investigat*) W/50 (fcpa OR “foreign corrupt practices act” OR corrupt*))

Documents are legal documents with size anywhere between 10,000 chars to 4Million chars.

I have tried query_string, simple_query_string, match, match_phrase, span_near as well as intervals, but to no luck.

The main unresolved issues are:

Any help will be greatly appreicated.

system · July 21, 2022, 11:34am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.