ML job rare function to detect anomaly on IP address will report IP on same network as anomaly

nameisnotimportant · August 25, 2020, 4:06am

Hi,
Referring to https://discuss.elastic.co/t/ml-how-to-find-anomaly-from-ip-address/239996

I found that IP address on same network will be detected as anomaly.
e.g. let say 10.180.1.161 is the regularly used IP address by a specific user, when that same user log in with IP address of 10.180.1.162, the ML job will report that as anomaly as well.

Is there a way to filter above scenario out from the result ? Is it possible to use custom rules to do that filter? If yes, may i have some sample code on how to do so?

richcollier · August 25, 2020, 2:51pm

What if, instead of analyzing the entire IP address, in the ML datafeed use a script_field to create just a subsection of the IP address (i.e. the first octet?)

So, instead of passing 109.180.1.161 to ML, you'd only be passing 109. In that way, the rarity of the first octet per user should be more effective.

Example query:

GET yourindexname/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "ip_first_octet": {
      "script": {
        "source": """
            def m = /^([0-9]+)\..*$/.matcher(doc['clientip'].value);
            if ( m.matches() ) {
              return Integer.parseInt(m.group(1))
            } else {
              return 0
            }
          """
      }
    }
  }
}

(obviously, above needs to be adapted to be incorporated into an ML datafeed query)

Also: note that in order to get the above to work, you might have to set script.painless.regex.enabled: true in elasticsearch.yml to allow regex matching

If this idea works effectively, consider doing the subsection at ingest time to avoid the overhead of calculating the script_field at query time.

nameisnotimportant · August 27, 2020, 2:05am

Thanks richcollier, after few days of research, i finally see your point now. I will pre-process the IP address during ingest time to read just the 1st octet for IPv4.
And for IPv6, can i just read in the first 3 blocks (the Global Unitcast Address)? I did tried to understand IPv6 structure from here but i think i still need more research to fully understand the structure.

system · September 24, 2020, 2:06am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ML: how to find anomaly from ip address Kibana elastic-stack-machine-learning	6	1401	August 6, 2020
Anomaly Detection for Rare IPs Elastic Stack elastic-stack-machine-learning	2	660	October 10, 2022
ML anomaly detection question Kibana elastic-stack-machine-learning	8	627	February 11, 2020
Filter for Known IP Addresses in ML Jobs Elasticsearch elastic-stack-machine-learning	2	420	March 1, 2019
Anomaly Result Interpretation for Seasonal Data Elasticsearch elastic-stack-machine-learning	4	697	July 31, 2020

ML job rare function to detect anomaly on IP address will report IP on same network as anomaly

Related topics