I have a transform that calculates hourly stats of users on each floor of each building:
{
"id": "building-floor",
"source": {
"index": [
"anindex"
],
"query": {
"match_all": {}
}
},
"dest": {
"index": "building-floor"
},
"sync": {
"time": {
"field": "@timestamp",
"delay": "60s"
}
},
"pivot": {
"group_by": {
"@timestamp": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "1h"
}
},
"building.keyword": {
"terms": {
"field": "building.keyword"
}
},
"floor.keyword": {
"terms": {
"field": "floor.keyword"
}
}
},
"aggregations": {
"user_name.keyword.cardinality": {
"cardinality": {
"field": "user_name.keyword"
}
}
}
},
"description": "Stats per building per floor",
"settings": {},
"version": "7.8.0",
"create_time": 1595890114249
}
Now I have a list of building occupancy from a CSV file that has: building,floor,max_capacity
I loaded this file into the index but of course there is no timestamp for these entries:
{
"_index" : "anindex",
"_type" : "_doc",
"_id" : "aoREk3MBaS8HmMr3F3v_",
"_score" : 1.0,
"_source" : {
"name" : "Main building",
"floor" : "1",
"max_capacity" : 20,
"building" : "MAIN"
}
}
What I am trying to do is compute the occupancy percentage (basically max_capacity/user_name.keyword.cardinality
) for each floor of each building. Ideally I'd like to do that in the transform but even if there is a simple to do that on the fly in a visualizer, I'd be good with it.
I can't seem to be able to achieve what would be a very simple join in SQL because I don't have a timestamp field in my building max occupancy data. For pretty much everything else I was able to denormalize the data but here I wouldn't even know how to denormalize it if I wanted to add max_capacity to every entry in the index based on building/floor.
Any guidance on what options I should look into to solve this problem is greatly appreciated.