Complex Proximity/Nested/exact-match Queries in Elasticsearch

I am looking for proximity (and nested proximity as well) between multiple hetrogeneous groups

By hetrogeneous I mean the token in the groups can be:

  • term and term
  • term and phrase
  • phrase and phrase
  • term and wildcard
  • phrase and wildcard
  • wildcard and wildcard
  • term and phrase and wildcard

Some examples of the search criteria from user's perspective are:

  • ((sfo OR “serious fraud office”) w/50 investigat*)
  • (potential w/30 (violation*))
  • (fcpa or "foreign corrupt practices act")
  • ((improper W/3 payment*) W/20 investigat*)
  • ((corrupt* W/5 payment*) W/20 investigat*)
  • ((sec OR "securities and exchange commission") w/20 (fcpa OR foreign corrupt practices act))
  • (brib* W/20 arrest*)
  • (((fcpa OR "foreign corrupt practices act") w/50 subpoe*) W/20 (violat* OR investigat*))
  • ((internal investigat*) W/50 (fcpa OR “foreign corrupt practices act” OR corrupt*))

Documents are legal documents with size anywhere between 10,000 chars to 4Million chars.

I have tried query_string, simple_query_string, match, match_phrase, span_near as well as intervals, but to no luck.

The main unresolved issues are:

  • How to do wildcard in proximity searches ?
  • How to handle nested proximity searches ?

Any help will be greatly appreicated.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.