Aggregate a search result?

Wassim_Ben_Amor · January 21, 2019, 2:12pm

Hello, I have documents with fields:

session_id
user_id
Every document represent a user request in a conversation.
Example :

12048×336 72.7 KB

when I execute a simple aggregation to show for each user how much dialogs he has:

{
"size": 0,
"aggs": {
"2": {
"terms": {
"field": "user_id.keyword"
},
"aggs": {
"1": {
"cardinality": {
"field": "session_id.keyword"
}
}
}
}
}
}

which shows :

But my goal is to aggregate the dialog count in the result (the value of the keys in the result) of the last aggregation to obtain this :

Mark_Harwood · January 21, 2019, 3:08pm

If you have many unique users and you want to do behavioural analysis on them I'd suggest looking at entity centric indexes.

Wassim_Ben_Amor · January 21, 2019, 3:13pm

Thank you for your reply,
So, there is no solution to make this directly from an ES query ?

Mark_Harwood · January 21, 2019, 3:18pm

you could probably achieve this with a scripted aggregation but not at large scale with many users.

Wassim_Ben_Amor · January 21, 2019, 3:22pm

Thank you for your reply sir,
Since I'm new to scripted aggregations, can you help me by providing a quick implementation for my problem please ?
Thank you,

Wassim_Ben_Amor · January 21, 2019, 4:37pm

The entity centric indexes concept is very useful in my case. I have to make an entity per user showing the number of sessions for each user.
But for fast solving, I have tried the scripted aggregation, but I found that it loops over all the documents which is not a good idea on a large scale of documents, there is no way to make scripted aggregation on search result ? I don't really want to make aggregation on each bucket, it's a global aggregation but based on the search result.
Thank you.

Mark_Harwood · January 21, 2019, 4:49pm

Aggregations only execute on docs that match search results (unless you're explicitly using a global aggregation at the root)

Wassim_Ben_Amor · January 21, 2019, 4:57pm

Thank you for your reply.
I mean in my case, I have all the documents in the search result, I don't filter documents. But what I wanted to do is to group the value obtained in the result :

I have Occurence of 3 => 1 and Occurence of 1 => 2. So the goal is to group the aggregation result based on values obtained.

Mark_Harwood · January 21, 2019, 5:15pm

How many unique users will you have in the production system?

Wassim_Ben_Amor · January 21, 2019, 5:21pm

It's a big number, i'm sure it is more than 1000 daily users, so it's not good to loop all the documents ...

Mark_Harwood · January 21, 2019, 5:32pm

What you see in your last example JSON results there isn't half of it either.

Each cardinality count of the number of unique sessions is derived from either a Map of unique session IDs or a probabilistic data structure based on hashes behind the scenes. It won't be cheap in terms of RAM when you multiply by numbers of users.

The entity centric index approach is probably the best way to go for any mass behavioural analysis.

Wassim_Ben_Amor · January 22, 2019, 9:34am

@Mark_Harwood Thank you for your support. I think the entity centric index approach is the best way to do this also and it will helps to obtain more graphs in the future.
But to make this topic useful, I will share the Scripted Metric Aggregation request, so it can help others.

POST test/_search
{
"size": 0,
"aggs": {
"colorgroups": {
"scripted_metric": {
"init_script": "state.transactions = [:]",
"map_script": "state.transactions[doc['session_id.keyword'].value] = doc['user_id.keyword'].value",
"combine_script": "return state.transactions;",
"reduce_script": "def result = [:]; def output = [:]; for (a in states) {for (session in a.keySet()) {if (result[a[session]] == null) {result[a[session]]=0} result[a[session]]++;}} for (a in result.values()){if (output[a.toString()]== null) {output[a.toString()]=0} output[a.toString()]++;} return output;"
}
}
}
}

Thank you.

Mark_Harwood · January 22, 2019, 9:52am

Glad you got something working but watch out for slow response times and/or "circuit breaker" exceptions for RAM use if you throw this at a lot of data.

system · February 19, 2019, 9:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Execute aggs query for each unqiue value? Elasticsearch	1	256	April 29, 2021
ElasticSearch use aggregation over another aggregation result Elasticsearch	2	467	April 14, 2020
Search Query on es aggregates Elasticsearch	2	552	July 5, 2017
Get count of documents through aggregation Elasticsearch	3	428	July 6, 2017
Beginner - How to aggregate count on large cardinality? Elasticsearch	5	435	April 1, 2019

Aggregate a search result?

Related topics