Meaning of regexp query complement operator

Hi All,
if you can help me with understanding what is complement in regexp queries really is by example
I've read documentation, however it's still not clear for me

e.g. I have following regexp query that tries to select only Android 4.0 user agents, the problem is that Windows Phone 8.1 has same Android 4.0 part inside, so I've created following:

    "regexp" : {
      "userAgent" : {
        "value" : "~(.*Windows Phone)(.*)Android 4\\.0(.*)",
        "flags": "COMPLEMENT"
      }
    }

the problem is that user agent values like
Mozilla/5.0 (Mobile; Windows Phone 8.1; Android 4.0; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; Microsoft; Lumia 640 Dual SIM) like iPhone OS 7_0_3 Mac OS X AppleWebKit/537 (KHTML, like Gecko) Mobile Safari/537

do match the regexp, while I've tried to "exclude" Windows Phone" from the match

What I'm missing? How to think about complement ~ in lucene regexps?
Isn't my query tells something like:
every value that doesn't have Windows Phone at the beginning, then has anything else, then "Android 4.0" and then once again anything else

Update:
I've managed to get what I want with Intersection, however in manual it's advised to rethink approach and not to use it

    "regexp" : {
      "userAgent" : {
        "value" : "~(.*Windows Phone.*)&.*Android 4\\.0.*",
        "flags": "COMPLEMENT|INTERSECTION"
      }
    }

Thanks in advance

I think what you've done with Intersection looks right - I can't think of any other way to achieve what you want.

General advice though: doing this with a regexp query at query time is EXPENSIVE. Much better to tag your documents appropriately at index time (eg using the ingest-user-agent plugin)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.