Aggregate a search result?

Hello, I have documents with fields:

  • session_id
  • user_id
    Every document represent a user request in a conversation.
    Example :

    when I execute a simple aggregation to show for each user how much dialogs he has:

{
"size": 0,
"aggs": {
"2": {
"terms": {
"field": "user_id.keyword"
},
"aggs": {
"1": {
"cardinality": {
"field": "session_id.keyword"
}
}
}
}
}
}

which shows :


But my goal is to aggregate the dialog count in the result (the value of the keys in the result) of the last aggregation to obtain this :
50

If you have many unique users and you want to do behavioural analysis on them I'd suggest looking at entity centric indexes.

Thank you for your reply,
So, there is no solution to make this directly from an ES query ?

you could probably achieve this with a scripted aggregation but not at large scale with many users.

Thank you for your reply sir,
Since I'm new to scripted aggregations, can you help me by providing a quick implementation for my problem please ?
Thank you,

The entity centric indexes concept is very useful in my case. I have to make an entity per user showing the number of sessions for each user.
But for fast solving, I have tried the scripted aggregation, but I found that it loops over all the documents which is not a good idea on a large scale of documents, there is no way to make scripted aggregation on search result ? I don't really want to make aggregation on each bucket, it's a global aggregation but based on the search result.
Thank you.

Aggregations only execute on docs that match search results (unless you're explicitly using a global aggregation at the root)

Thank you for your reply.
I mean in my case, I have all the documents in the search result, I don't filter documents. But what I wanted to do is to group the value obtained in the result :


I have Occurence of 3 => 1 and Occurence of 1 => 2. So the goal is to group the aggregation result based on values obtained.

How many unique users will you have in the production system?

It's a big number, i'm sure it is more than 1000 daily users, so it's not good to loop all the documents ...

What you see in your last example JSON results there isn't half of it either.

Each cardinality count of the number of unique sessions is derived from either a Map of unique session IDs or a probabilistic data structure based on hashes behind the scenes. It won't be cheap in terms of RAM when you multiply by numbers of users.

The entity centric index approach is probably the best way to go for any mass behavioural analysis.

@Mark_Harwood Thank you for your support. I think the entity centric index approach is the best way to do this also and it will helps to obtain more graphs in the future.
But to make this topic useful, I will share the Scripted Metric Aggregation request, so it can help others.

POST test/_search
{
"size": 0,
"aggs": {
"colorgroups": {
"scripted_metric": {
"init_script": "state.transactions = [:]",
"map_script": "state.transactions[doc['session_id.keyword'].value] = doc['user_id.keyword'].value",
"combine_script": "return state.transactions;",
"reduce_script": "def result = [:]; def output = [:]; for (a in states) {for (session in a.keySet()) {if (result[a[session]] == null) {result[a[session]]=0} result[a[session]]++;}} for (a in result.values()){if (output[a.toString()]== null) {output[a.toString()]=0} output[a.toString()]++;} return output;"
}
}
}
}


Thank you.

Glad you got something working but watch out for slow response times and/or "circuit breaker" exceptions for RAM use if you throw this at a lot of data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.