Identifies the anomaly in the users unusual login time. IE if users regularly logs in between 11:00 AM - 6:00 PM and one day he/she logs in @ 12:00 AM. this will be detected as a anomaly.
Detect longitude-latitude usually used by a person, if a person usually logs in fro 11.222, -11.33 (example) and suddenly logs in from 13.22, -34,55. this will be detected as anomaly. This 2 job independently works fine.
I need to write a watcher over these 2 job which determined unusual user logins and geo location at the same. IE user logged in at a unusual time and Geo location.
In the chained watcher I was able to identify both individually. How should I compare the time of the user and username from both the results and send an alert if it happens.
IE comparing the values from the 2 different job, and if then match -> send alert. Which me comparing one array job value with other job values.
I don't think this is really an ML-specific question - it just can be boiled down to how does one compare lists from two separate queries from the different chained inputs, if those queries each return an array.
I sort of simulated this by running a chained input watch against 2 identical jobs (with different names) that of course, both return the same entity as anomalous.
My compare condition, however, had to "hardcode" the first entry of the results array:
But this obviously doesn't take into account if there are more than one "hits" on the results.
Hey @spinscale - is it possible to either use array_compare or mustache syntax to compare the array of hits? I think the trick in this case would be that you cannot guarantee that a particular entity is in the same index of the results array. So "AAL" might be index 0 of the hits array for the first chain, but may be in some other index location for the second input chain.
two things to note - first and second are the names of my two chained input queries. Essentially, the condition script takes the anomalies for the second query, and puts them in a map/list called second_results. Then do the same to the first query's results, but then test is to see if there's any intersection of items from the second_results list (test to see if the list of matches is bigger than 0). Secondly, note again that in my specific example, it is the partition_field_value that contains the name of the entities that I'm interested in.
In my little test, my first query returned 3 entities:
Method 2 - a smarter filtered terms query for the second query
You could also follow the model shown in this example:
Where the second query does a must and a terms filter that passes all of the items from the first query as itms that must exist in the second query. It uses mustache syntax to iterate through all instances of, in this case, process names
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.