Conditional counts on subbuckets

cabp · December 10, 2018, 7:27am

I want to get the counts of nested groups which satisfy a certain condition.
Consider the following:

DELETE usagestats
PUT usagestats/test/1
{
  "user-id": "abc",
  "session-id": "abc-1",
  "action": "A123"
}

PUT usagestats/test/2
{
  "user-id": "abc",
  "session-id": "abc-2",
  "action": "A123"
}

PUT usagestats/test/3
{
  "user-id": "xyz",
  "session-id": "xyz-1",
  "action": "A123"
}

PUT usagestats/test/4
{
  "user-id": "xyz",
  "session-id": "xyz-1",
  "action": "A123"
}

PUT usagestats/test/5
{
  "user-id": "abc",
  "session-id": "abc-2",
  "action": "Z789"
}

I want to tell tell, how many returning user ( session-ids >= 2 ) triggered a specific action ( action = "A123" ) on Mon., Thu., Wed...
Therefore I am interessted in the count of a specific "action"s on a timeline if a given "user-id" that has at least 2 distinct "session-id"s.

My approach:

create a bucket for each "user-id"

GET usagestats/test/_search?
{
"aggs" : {
"test" : {
"terms" : { "field" : "user-id.keyword" }
}
}
}

But how do I go on from here??

consider only buckets having at least 2 distinct "session-ids" vs. consider only buckets having exactly 1 distinct "session-ids".
consider only buckets having a least one document matching (action: "A123")
the aggregate count of remaining documents in a time-histogram

Thanks alot!
Tobi

cabp · December 12, 2018, 7:42am

The bucket-selector did the trick:

GET usagestats/test/_search?
{
   "aggs":{
      "user":{
         "terms":{
            "field":"user-id.keyword"
         },
         "aggs":{
            "session":{
               "terms":{
                  "field":"session-id.keyword"
               }
            },
            "session_filter":{
               "bucket_selector":{
                  "buckets_path":{
                     "sessions":"session._bucket_count"
                  },
                  "script":"params.sessions > 1"
               }
            }
         }
      }
   }
}

cabp · December 12, 2018, 1:25pm

The selection of the bucket works, but it does not satisfy my use-case.

It returns:

"buckets": [
	{
	"key": "abc-1",
	"doc_count": 2
	},
	{
	"key": "abc-2",
	"doc_count": 1
	}
]

But I do not want the aggration as a result (i.e. "doc_count": 2) , but all documents:

"buckets": [
	{
	"session-id": "abc-1",
	"action": "A123"
	},
	{
	"session-id": "abc-2",
	"action": "Z789"
	},
	{
	"session-id": "abc-2",
	"action": "A123"
	}
]

Can I achive this using multi-aggregations??

abdon · December 13, 2018, 5:17pm

Have you considered the top_hits aggregation? It can show you the documents in each bucket:

GET usagestats/test/_search
{
  "size": 0,
  "aggs": {
    "user": {
      "terms": {
        "field": "user-id.keyword"
      },
      "aggs": {
        "session": {
          "terms": {
            "field": "session-id.keyword"
          },
          "aggs": {
            "documents": {
              "top_hits": {
                "size": 10
              }
            }
          }
        },
        "session_filter": {
          "bucket_selector": {
            "buckets_path": {
              "sessions": "session._bucket_count"
            },
            "script": "params.sessions > 1"
          }
        }
      }
    }
  }
}

cabp · December 17, 2018, 10:25am

Looks promising, indeed! Thank you very much.

system · January 14, 2019, 10:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Visualize counts on nested buckets Kibana	5	1538	January 7, 2019
Need help with ElasticSearch Pipeline Aggregation query Elasticsearch	1	463	March 14, 2017
Mathematical operations on individual bucket elemenst Elasticsearch	8	1888	March 10, 2017
Nested aggregation count seems incorect Elasticsearch	4	1003	July 5, 2017
Can you run a bucket selector aggregation using the output of a stats bucket agg? Elasticsearch	5	1549	December 21, 2017

Conditional counts on subbuckets

Related topics