How to correctly use lat_long detector on geo.location from azure.signin dataset

Hello,

When I try to create a machine learning job with lat_long detector, for some reason, the job is not starting to process records.

Query:

{"bool":{"must":[{"match_all":{}}],"filter":[{"match_phrase":{"event.dataset":"azure.signinlogs"}}],"must_not":[]}}


Anyone an idea why the ml job does not seem to start processing records? There is definitely signin data in our filebeat index.
Any sugggestion is welcome to make this work. I'm trying to replicate some Azure Sentinel functionalities and find out when a user is signing in from a anomalous location.

(first time I'm using the lat_long detector so it's probably a rooky mistake)

The datafeed preview is also empty:

Grtz

Willem

From: https://www.elastic.co/guide/en/machine-learning/7.7/ml-geo-functions.html

The field_name that you supply must be a single string that contains two comma-separated numbers of the form latitude,longitude , a geo_point field, a geo_shape field that contains point values, or a geo_centroid aggregation. The latitude and longitude must be in the range -180 to 180 and represent a point on the surface of the Earth.

Can you verify the type of field your geo.location field is mapped as?

@richcollier Tx for the fast answer.

Fyi this on Elastic 7.6.1

"geo" : {
  "properties" : {
    "city_name" : {
      "type" : "keyword",
      "ignore_above" : 1024
    },
    "continent_name" : {
      "type" : "keyword",
      "ignore_above" : 1024
    },
    "country_iso_code" : {
      "type" : "keyword",
      "ignore_above" : 1024
    },
    "country_name" : {
      "type" : "keyword",
      "ignore_above" : 1024
    },
    "location" : {
      "type" : "geo_point"
    },
    "name" : {
      "type" : "keyword",
      "ignore_above" : 1024
    },
    "region_iso_code" : {
      "type" : "keyword",
      "ignore_above" : 1024
    },
    "region_name" : {
      "type" : "keyword",
      "ignore_above" : 1024
    }
  }
} 

This is coming from the azure.signin Filebeat event.dataset

Grtz

Ok - glad to see that it is truly a geo_point.

If you're seeing nothing in the datafeed preview you should probably test to see if your query of the raw data is working as expected. Take what you have as the datafeed query and try it as a standard _search (I'm guessing below that your index pattern is indeed filebeat-*, but modify if necessary):

GET filebeat-*/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ],
      "filter": [
        {
          "match_phrase": {
            "event.dataset": "azure.signinlogs"
          }
        }
      ],
      "must_not": []
    }
  }
}

Does that properly return the data you expect?

@richcollier Yes the query returns the docs I expected (can't show them for privacy concerns etc) incl the geo.location field.

GET filebeat/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ],
      "filter": [
        {
          "match_phrase": {
            "event.dataset": "azure.signinlogs"
          }
        }
      ],
      "must_not": []
    }
  }
} 

I do use an alias filebeat though, but I've been doing that before, so I don't think that's the issue.

Grtz

Willem

Hmm...well there is nothing obvious that is wrong. I attempted a similar configuration using the kibana_sample_data_ecommerce data that ships with Kibana.

Perhaps try it on the sample data and then compare with your "real" setup and see what is different? If you cannot figure it out after that, It will be hard to debug further in this setting. A proper Support Case would likely be necessary.

Getting some sleep now, I will try your sample Data suggestion and might make a case. Tx!

@richcollier Started a "lat_long(OriginLocation) by Carrier" on the flights sample dataset and it immediately started processing.

  "analysis_config": {
    "bucket_span": "15m",
    "detectors": [
      {
        "detector_description": "lat_long(OriginLocation) by Carrier",
        "function": "lat_long",
        "field_name": "OriginLocation",
        "by_field_name": "Carrier",
        "detector_index": 0
      }
    ],
    "influencers": [
      "OriginWeather",
      "Carrier",
      "DestWeather"
    ]
  }, 

Not sure what I'm doing wrong on my Azure signin logs.. Tried recreating the lat_long("geo.location") by "azure.signinlogs.identity" but again no records processed.

See 00547781

Grtz

Willem

Hi,

Not sure if this helps but i was also doing the geo_point stuff but it was with python. and i had to add the pipeline

# defining the ingest pipeline and the specific values 
p.put_pipeline(id='attachment', body={
    'description': 'Extract attachment information',
    'processors': [
        {
            "set": {
                "field": "location",
                "value": "{{Lat}}, {{Lon}}"
            }
        }
    ]
})

location here is a geo_point, and it takes in value from latitude and longitude

Forgot to ask this obvious question. Did you have your original job "look back" in time, or did you only kick it off in real-time?

Well I created the job 2 dyas or so after we started using the signing logs dataset. And configured it to start from the beginning of the data and continue. I attached a video to the case. Grtz