Comparing values from 2 ML Jobs in watcher

Hi,

I have 2 different ML jobs.

  1. Identifies the anomaly in the users unusual login time. IE if users regularly logs in between 11:00 AM - 6:00 PM and one day he/she logs in @ 12:00 AM. this will be detected as a anomaly.

  2. Detect longitude-latitude usually used by a person, if a person usually logs in fro 11.222, -11.33 (example) and suddenly logs in from 13.22, -34,55. this will be detected as anomaly. This 2 job independently works fine.

I need to write a watcher over these 2 job which determined unusual user logins and geo location at the same. IE user logged in at a unusual time and Geo location.

In the chained watcher I was able to identify both individually. How should I compare the time of the user and username from both the results and send an alert if it happens.

IE comparing the values from the 2 different job, and if then match -> send alert. Which me comparing one array job value with other job values.

Thank for all your help

I don't think this is really an ML-specific question - it just can be boiled down to how does one compare lists from two separate queries from the different chained inputs, if those queries each return an array.

I sort of simulated this by running a chained input watch against 2 identical jobs (with different names) that of course, both return the same entity as anomalous.

My compare condition, however, had to "hardcode" the first entry of the results array:

    "condition": {
      "compare": {
        "ctx.payload.first.hits.hits.0._source.partition_field_value": {
          "eq": "{{ctx.payload.second.hits.hits.0._source.partition_field_value}}"
        }
      }
    },

where partition_field_value is the field that is used to "split" the analysis on in my particular case.

The result, of course is:

      "condition": {
        "type": "compare",
        "status": "success",
        "met": true,
        "compare": {
          "resolved_values": {
            "ctx.payload.second.hits.hits.0._source.partition_field_value": "AAL",
            "ctx.payload.first.hits.hits.0._source.partition_field_value": "AAL"
          }
        }
      }

But this obviously doesn't take into account if there are more than one "hits" on the results.

Hey @spinscale - is it possible to either use array_compare or mustache syntax to compare the array of hits? I think the trick in this case would be that you cannot guarantee that a particular entity is in the same index of the results array. So "AAL" might be index 0 of the hits array for the first chain, but may be in some other index location for the second input chain.

That's exactly the issue I am facing how can i use either java-script, painless to iterate through the 2 arrays and compare. ?

Ok - after a little research, this could be done in probably two ways

Method 1 - a script on the condition to see if there's an intersection of results from both queries

    "condition": {
      "script": "def second_results = ctx.payload.second.hits.hits.stream().map(hit->hit._source.partition_field_value).collect(Collectors.toList()); return ctx.payload.first.hits.hits.stream().map(hit -> hit._source.partition_field_value).filter(p->second_results.contains(p)).collect(Collectors.toList()).size() > 0;"
    },
    "actions": {
      "log": {
        "logging": {
          "text": "{{ctx.payload}}"
        }
      }
    }

two things to note - first and second are the names of my two chained input queries. Essentially, the condition script takes the anomalies for the second query, and puts them in a map/list called second_results. Then do the same to the first query's results, but then test is to see if there's any intersection of items from the second_results list (test to see if the list of matches is bigger than 0). Secondly, note again that in my specific example, it is the partition_field_value that contains the name of the entities that I'm interested in.

In my little test, my first query returned 3 entities:

AAL ACA AWE

and the second query returned 2 entities:
ACA AAL

and my watch returns the expected intersection:

      "condition": {
        "type": "script",
        "status": "success",
        "met": true
      },
      "actions": [
        {
          "id": "log",
          "type": "logging",
          "status": "success",
          "logging": {
            "logged_text": "{_value=[AAL, ACA]}"
          }
        }
      ]
    },

Method 2 - a smarter filtered terms query for the second query

You could also follow the model shown in this example:

Where the second query does a must and a terms filter that passes all of the items from the first query as itms that must exist in the second query. It uses mustache syntax to iterate through all instances of, in this case, process names

                          "terms": {
                            "process_host": [
                              "{{#ctx.payload.started_processes.aggregations.process_hosts.buckets}}{{key}}",
                              "{{/ctx.payload.started_processes.aggregations.process_hosts.buckets}}"
                            ]
                          }

Hope that gives you some ideas

Thanks Richcollier, this will definitely help.

since Elasticsearch 6.2 you can also have transformations between two chain inputs to simplify handling of such things.

https://www.elastic.co/guide/en/x-pack/6.2/input-chain.html

@richcollier as you already did, the correct way is to go with a scripted painless condition instead of array compare condition

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.