I am looking for proximity (and nested proximity as well) between multiple hetrogeneous groups
By hetrogeneous I mean the token in the groups can be:
- term and term
- term and phrase
- phrase and phrase
- term and wildcard
- phrase and wildcard
- wildcard and wildcard
- term and phrase and wildcard
Some examples of the search criteria from user's perspective are:
((sfo OR “serious fraud office”) w/50 investigat*)
(potential w/30 (violation*))
(fcpa or "foreign corrupt practices act")
((improper W/3 payment*) W/20 investigat*)
((corrupt* W/5 payment*) W/20 investigat*)
((sec OR "securities and exchange commission") w/20 (fcpa OR foreign corrupt practices act))
(brib* W/20 arrest*)
(((fcpa OR "foreign corrupt practices act") w/50 subpoe*) W/20 (violat* OR investigat*))
((internal investigat*) W/50 (fcpa OR “foreign corrupt practices act” OR corrupt*))
Documents are legal documents with size anywhere between 10,000 chars to 4Million chars.
I have tried query_string, simple_query_string, match, match_phrase, span_near as well as intervals, but to no luck.
The main unresolved issues are:
- How to do wildcard in proximity searches ?
- How to handle nested proximity searches ?
Any help will be greatly appreicated.