Conditional counts on subbuckets


#1

I want to get the counts of nested groups which satisfy a certain condition.
Consider the following:

DELETE usagestats
PUT usagestats/test/1
{
  "user-id": "abc",
  "session-id": "abc-1",
  "action": "A123"
}

PUT usagestats/test/2
{
  "user-id": "abc",
  "session-id": "abc-2",
  "action": "A123"
}

PUT usagestats/test/3
{
  "user-id": "xyz",
  "session-id": "xyz-1",
  "action": "A123"
}

PUT usagestats/test/4
{
  "user-id": "xyz",
  "session-id": "xyz-1",
  "action": "A123"
}

PUT usagestats/test/5
{
  "user-id": "abc",
  "session-id": "abc-2",
  "action": "Z789"
}

I want to tell tell, how many returning user ( session-ids >= 2 ) triggered a specific action ( action = "A123" ) on Mon., Thu., Wed...
Therefore I am interessted in the count of a specific "action"s on a timeline if a given "user-id" that has at least 2 distinct "session-id"s.

My approach:

  1. create a bucket for each "user-id"

    GET usagestats/test/_search?
    {
    "aggs" : {
    "test" : {
    "terms" : { "field" : "user-id.keyword" }
    }
    }
    }

But how do I go on from here??

  1. consider only buckets having at least 2 distinct "session-ids" vs. consider only buckets having exactly 1 distinct "session-ids".
  2. consider only buckets having a least one document matching (action: "A123")
  3. the aggregate count of remaining documents in a time-histogram

Thanks alot!
Tobi


#2

The bucket-selector did the trick:

GET usagestats/test/_search?
{
   "aggs":{
      "user":{
         "terms":{
            "field":"user-id.keyword"
         },
         "aggs":{
            "session":{
               "terms":{
                  "field":"session-id.keyword"
               }
            },
            "session_filter":{
               "bucket_selector":{
                  "buckets_path":{
                     "sessions":"session._bucket_count"
                  },
                  "script":"params.sessions > 1"
               }
            }
         }
      }
   }
}

#3

The selection of the bucket works, but it does not satisfy my use-case.

It returns:

"buckets": [
	{
	"key": "abc-1",
	"doc_count": 2
	},
	{
	"key": "abc-2",
	"doc_count": 1
	}
]

But I do not want the aggration as a result (i.e. "doc_count": 2) , but all documents:

"buckets": [
	{
	"session-id": "abc-1",
	"action": "A123"
	},
	{
	"session-id": "abc-2",
	"action": "Z789"
	},
	{
	"session-id": "abc-2",
	"action": "A123"
	}
]

Can I achive this using multi-aggregations??


(Abdon Pijpelink) #4

Have you considered the top_hits aggregation? It can show you the documents in each bucket:

GET usagestats/test/_search
{
  "size": 0,
  "aggs": {
    "user": {
      "terms": {
        "field": "user-id.keyword"
      },
      "aggs": {
        "session": {
          "terms": {
            "field": "session-id.keyword"
          },
          "aggs": {
            "documents": {
              "top_hits": {
                "size": 10
              }
            }
          }
        },
        "session_filter": {
          "bucket_selector": {
            "buckets_path": {
              "sessions": "session._bucket_count"
            },
            "script": "params.sessions > 1"
          }
        }
      }
    }
  }
}

#5

Looks promising, indeed! Thank you very much.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.