ML job rare function to detect anomaly on IP address will report IP on same network as anomaly

Referring to

I found that IP address on same network will be detected as anomaly.
e.g. let say is the regularly used IP address by a specific user, when that same user log in with IP address of, the ML job will report that as anomaly as well.

Is there a way to filter above scenario out from the result ? Is it possible to use custom rules to do that filter? If yes, may i have some sample code on how to do so?

What if, instead of analyzing the entire IP address, in the ML datafeed use a script_field to create just a subsection of the IP address (i.e. the first octet?)

So, instead of passing to ML, you'd only be passing 109. In that way, the rarity of the first octet per user should be more effective.

Example query:

GET yourindexname/_search
  "query": {
    "match_all": {}
  "script_fields": {
    "ip_first_octet": {
      "script": {
        "source": """
            def m = /^([0-9]+)\..*$/.matcher(doc['clientip'].value);
            if ( m.matches() ) {
              return Integer.parseInt(
            } else {
              return 0

(obviously, above needs to be adapted to be incorporated into an ML datafeed query)

Also: note that in order to get the above to work, you might have to set script.painless.regex.enabled: true in elasticsearch.yml to allow regex matching

If this idea works effectively, consider doing the subsection at ingest time to avoid the overhead of calculating the script_field at query time.

Thanks richcollier, after few days of research, i finally see your point now. I will pre-process the IP address during ingest time to read just the 1st octet for IPv4.
And for IPv6, can i just read in the first 3 blocks (the Global Unitcast Address)? I did tried to understand IPv6 structure from here but i think i still need more research to fully understand the structure.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.