I am looking for proximity (and nested proximity as well) between multiple hetrogeneous groups
By hetrogeneous I mean the token in the groups can be:
- term and term
- term and phrase
- phrase and phrase
- term and wildcard
- phrase and wildcard
- wildcard and wildcard
- term and phrase and wildcard
Some examples of the search criteria from user's perspective are:
((sfo OR “serious fraud office”) w/50 investigat*)(potential w/30 (violation*))(fcpa or "foreign corrupt practices act")((improper W/3 payment*) W/20 investigat*)((corrupt* W/5 payment*) W/20 investigat*)((sec OR "securities and exchange commission") w/20 (fcpa OR foreign corrupt practices act))(brib* W/20 arrest*)(((fcpa OR "foreign corrupt practices act") w/50 subpoe*) W/20 (violat* OR investigat*))((internal investigat*) W/50 (fcpa OR “foreign corrupt practices act” OR corrupt*))
Documents are legal documents with size anywhere between 10,000 chars to 4Million chars.
I have tried query_string, simple_query_string, match, match_phrase, span_near as well as intervals, but to no luck.
The main unresolved issues are:
- How to do wildcard in proximity searches ?
- How to handle nested proximity searches ?
Any help will be greatly appreicated.