Hi Team , I have the following scenario , where i am receiving logs from at max 50k users in every 30 seconds for 60 mins. I am using elasticsearch to store the data in following format.
[
{
"viewlogId": "9abb5a3a-3678-4459-a425-ccb6f957e317",
"creationTime": 1575187230000,
"userId": "USERID_0",
"viewingSessionId": "2991fa12_viewingSessionId_0_1"
},
{
"viewlogId": "9abb5a3a-3678-4459-a425-ccb6f957e318",
"creationTime": 1575187230000,
"userId": "USERID_0",
"viewingSessionId": "2991fa12_viewingSessionId_0_1"
},
{
"viewlogId": "9abb5a3a-3678-4459-a425-ccb6f957e319",
"creationTime": 1575187230000,
"userId": "USERID_0",
"viewingSessionId": "2991fa12_viewingSessionId_0_1"
},
{
"viewlogId": "9abb5a3a-3678-4459-a425-ccb6f957e320",
"creationTime": 1575187290000,
"userId": "USERID_0",
"viewingSessionId": "2991fa12_viewingSessionId_0_1"
},
{
"viewlogId": "9abb5a3a-3678-4459-a425-ccb6f957e321",
"creationTime": 1575187290000,
"userId": "USERID_0",
"viewingSessionId": "2991fa12_viewingSessionId_0_1"
}
]
This Sample has data for single user for one session with viewingSessionId 2991fa12_viewingSessionId_0_1
.The viewingSessionId is going to be unique for every user.
Now i am interested in showing a histogram per minute with unique viewsessionIds. for that i am using the following query.
GET <<index_name>>/_search
{
"size": 0,
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1
}
},
"aggregations": {
"total_views": {
"cardinality": {
"field": "viewingSessionId"
}
},
"date_histogram_1": {
"date_histogram": {
"field": "creationTime",
"fixed_interval": "1m"
},
"aggregations": {
"user_counts": {
"cardinality": {
"field": "viewingSessionId"
}
}
}
}
}
}
But according to elastic docs here and also i have observed during testing cardinality counts are approximate with threshold of 40k. Since I have 50k users and 1-2 viewlog per minute so in one bucket i will be having 150k records at max and counts will be approximate.
Any other approaches to solve the problem either by changing index structure or by querying
Thanks
Elasticsearch version : 7.4.1