Thank you for your reply sir,
Since I'm new to scripted aggregations, can you help me by providing a quick implementation for my problem please ?
Thank you,
The entity centric indexes concept is very useful in my case. I have to make an entity per user showing the number of sessions for each user.
But for fast solving, I have tried the scripted aggregation, but I found that it loops over all the documents which is not a good idea on a large scale of documents, there is no way to make scripted aggregation on search result ? I don't really want to make aggregation on each bucket, it's a global aggregation but based on the search result.
Thank you.
Thank you for your reply.
I mean in my case, I have all the documents in the search result, I don't filter documents. But what I wanted to do is to group the value obtained in the result :
What you see in your last example JSON results there isn't half of it either.
Each cardinality count of the number of unique sessions is derived from either a Map of unique session IDs or a probabilistic data structure based on hashes behind the scenes. It won't be cheap in terms of RAM when you multiply by numbers of users.
The entity centric index approach is probably the best way to go for any mass behavioural analysis.
@Mark_Harwood Thank you for your support. I think the entity centric index approach is the best way to do this also and it will helps to obtain more graphs in the future.
But to make this topic useful, I will share the Scripted Metric Aggregation request, so it can help others.
POST test/_search
{
"size": 0,
"aggs": {
"colorgroups": {
"scripted_metric": {
"init_script": "state.transactions = [:]",
"map_script": "state.transactions[doc['session_id.keyword'].value] = doc['user_id.keyword'].value",
"combine_script": "return state.transactions;",
"reduce_script": "def result = [:]; def output = [:]; for (a in states) {for (session in a.keySet()) {if (result[a[session]] == null) {result[a[session]]=0} result[a[session]]++;}} for (a in result.values()){if (output[a.toString()]== null) {output[a.toString()]=0} output[a.toString()]++;} return output;"
}
}
}
}
Glad you got something working but watch out for slow response times and/or "circuit breaker" exceptions for RAM use if you throw this at a lot of data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.