Finding the first occurrence of all Unique IDs within a term

SkipJ · February 10, 2022, 10:06pm

I'm trying to create a visualization that captures all new users a from the last month (now/M-1 to now/M), then capture the amount of sessions they've ran from counting their unique session IDs.

The challenge I'm facing is that I want to automate this so we pull all new user IDs and their session IDs without doing what we currently do, which is to manually create and add a filter that contains the new user IDs, then running a Unique Session count metric against it.

I'm admittedly unfamiliar with Kibana and trying to find resources, but I'm not sure what the best approach is.

Thanks!

Tomo_M · February 11, 2022, 1:13am

If you have more than some tens of thousands of users (above the max bucket limit), it's difficult to achieve the results from a single query.

If it's below the limit, bucket select aggreation is a possible option.

Otherwise, The easiest way could be flag their first session or add the the information about user creation date on the client side. Is it difficult?

Tomo_M · February 11, 2022, 10:35am

My multistage solution is as below. I used kibana_sample_data_logs sample dataset and consider clientip as fake 'user ID'. Selected new user for last day (not month).

Outline

create 1st transform groupby clientip to get their first timestamp
create 2nd transform to aggregate the first timestamp to day buckets with scripted metric aggregation to retrieve unique clientips as array.
query terms lookup to filter clientips whose first timestamp is on a specific day.

Detail

1. create 1st transform groupby `clientip` to get their first timestamp

PUT /_transform/test_kibana_sample_first_timestamp
{
  "dest":{
    "index": "test_kibana_sample_first_timestamp"
  },
  "pivot": {
    "group_by": {
      "clientip": {
        "terms": {
          "field": "clientip"
        }
      }
    },
    "aggregations": {
      "first_timestamp": {
        "min": {
          "field": "timestamp"
        }
      }
    }
  },
  "source": {
    "index": "kibana_sample_data_logs"
  }
}

POST _transform/test_kibana_sample_first_timestamp/_start

Then you get:

GET /test_kibana_sample_first_timestamp/_search?size=3&filter_path=hits.hits._source

{
  "hits" : {
    "hits" : [
      {
        "_source" : {
          "first_timestamp" : "2022-01-24T07:51:57.333Z",
          "clientip" : "0.72.176.46"
        }
      },
      {
        "_source" : {
          "first_timestamp" : "2022-01-26T11:02:32.392Z",
          "clientip" : "0.207.229.147"
        }
      },
      {
        "_source" : {
          "first_timestamp" : "2022-01-27T10:55:29.114Z",
          "clientip" : "0.209.144.101"
        }
      }
    ]
  }
}

2. create 2nd transform to aggregate the first timestamp to day buckets with scripted metric aggregation to retrieve unique `clientip`s as array.

First, create target index of the 2nd transform:

PUT /test_kibana_sample_first_timestamp_bucket/
{
  "mappings": {
    "properties": {
      "day": {"type":"date"},
      "new_ip": {"type": "ip"}
    }
  }
}

Then make ingest pipeline to set _id field:

PUT /_ingest/pipeline/test_set_id
{
  "processors": [
    {
      "set": {
        "field": "_id",
        "value": "{{day}}"
      }
    }
  ]
}

Create 2nd transform and start.
I used scripted metric "unique values aggregation" .

PUT /_transform/test_kibana_sample_first_timestamp_bucket
{
  "dest":{
    "index": "test_kibana_sample_first_timestamp_bucket",
    "pipeline": "test_set_id"
  },
  "pivot": {
    "group_by": {
      "day": {
        "date_histogram": {
          "field": "first_timestamp",
          "calendar_interval": "day"
        }
      }
    },
    "aggregations": {
      "new_ip": {
        "scripted_metric": {
          "init_script": "state.set = new HashSet()",
          "map_script": "if (params['_source'].containsKey(params.field)) {state.set.add(params['_source'][params.field])}",
          "combine_script": "return state.set",
          "reduce_script": "def ret = new HashSet(); for (s in states) {for (k in s) {ret.add(k);}} return ret",
          "params":{
            "field": "clientip"
          }
        }
      }
    }
  },
  "source": {
    "index": "test_kibana_sample_first_timestamp"
  }
}

POST /_transform/test_kibana_sample_first_timestamp_bucket/_start

Then you get:

GET /test_kibana_sample_first_timestamp_bucket/_search?size=1&filter_path=hits.hits._id,hits.hits._source

{
  "hits" : {
    "hits" : [
      {
        "_id" : "2022-01-23T00:00:00.000Z",
        "_source" : {
          "day" : "2022-01-23T00:00:00.000Z",
          "new_ip" : [
            "19.253.238.55",
            "42.17.43.107",
            "81.32.253.92",
            "91.183.212.113",
            "240.58.187.246",
            "1.5.239.89",...
          ]
        }
      }
    ]
  }
}

3. query terms lookup to filter `clientip`s whose first timestamp is on a specific day.

You can filter clientips whose first timestamp is on a specific day.

GET /kibana_sample_data_logs/_search
{
  "query":{
    "terms": {
      "clientip": {
        "index": "test_kibana_sample_first_timestamp_bucket",
        "id": "2022-01-23T00:00:00.000Z",
        "path": "new_ip"
      }
    }
  }
}

This is a sample visualization with filtering by clientips whose first time stamp is on 2022-01-27.

You have to use Query DSL to use terms lookup query for filtering.

system · March 11, 2022, 10:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Need help with aggregation and unique counted values Elasticsearch	2	583	July 6, 2017
Determining unique number of entries in a termsFacet, passing that to date-histogram-facet Elasticsearch	2	281	July 6, 2017
Use unique count 'metric' visual combined with a filter Kibana	3	2746	July 6, 2017
Apache live stats using elasticsearch and logstash Elasticsearch	20	1174	July 6, 2017
Visualize count of unique identities over time Kibana	4	9195	April 23, 2018

Finding the first occurrence of all Unique IDs within a term

Outline

Detail

1. create 1st transform groupby clientip to get their first timestamp

2. create 2nd transform to aggregate the first timestamp to day buckets with scripted metric aggregation to retrieve unique clientips as array.

3. query terms lookup to filter clientips whose first timestamp is on a specific day.

Related topics

1. create 1st transform groupby `clientip` to get their first timestamp

2. create 2nd transform to aggregate the first timestamp to day buckets with scripted metric aggregation to retrieve unique `clientip`s as array.

3. query terms lookup to filter `clientip`s whose first timestamp is on a specific day.