Computing percentages in transforms

cecchet · July 28, 2020, 1:54pm

I have a transform that calculates hourly stats of users on each floor of each building:

{
  "id": "building-floor",
  "source": {
    "index": [
      "anindex"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "building-floor"
  },
  "sync": {
    "time": {
      "field": "@timestamp",
      "delay": "60s"
    }
  },
  "pivot": {
    "group_by": {
      "@timestamp": {
        "date_histogram": {
          "field": "@timestamp",
          "calendar_interval": "1h"
        }
      },
      "building.keyword": {
        "terms": {
          "field": "building.keyword"
        }
      },
      "floor.keyword": {
        "terms": {
          "field": "floor.keyword"
        }
      }
    },
    "aggregations": {
      "user_name.keyword.cardinality": {
        "cardinality": {
          "field": "user_name.keyword"
        }
      }
    }
  },
  "description": "Stats per building per floor",
  "settings": {},
  "version": "7.8.0",
  "create_time": 1595890114249
}

Now I have a list of building occupancy from a CSV file that has: building,floor,max_capacity
I loaded this file into the index but of course there is no timestamp for these entries:

{
  "_index" : "anindex",
  "_type" : "_doc",
  "_id" : "aoREk3MBaS8HmMr3F3v_",
  "_score" : 1.0,
  "_source" : {
    "name" : "Main building",
    "floor" : "1",
    "max_capacity" : 20,
    "building" : "MAIN"
  }
}

What I am trying to do is compute the occupancy percentage (basically max_capacity/user_name.keyword.cardinality) for each floor of each building. Ideally I'd like to do that in the transform but even if there is a simple to do that on the fly in a visualizer, I'd be good with it.

I can't seem to be able to achieve what would be a very simple join in SQL because I don't have a timestamp field in my building max occupancy data. For pretty much everything else I was able to denormalize the data but here I wouldn't even know how to denormalize it if I wanted to add max_capacity to every entry in the index based on building/floor.

Any guidance on what options I should look into to solve this problem is greatly appreciated.

Hendrik_Muhs · July 28, 2020, 3:39pm

Hi,

if you want hourly statistics, you should have a timestamp field in your data, don't you? But maybe I do not understand it. Does all of your data miss the timestamp? The source you posted does not contain user_name either.

I can only guess that your real data has a timestamp and the user_name and with the CSV data you try to enrich the results?

If so, enrich is what you look for. You can feed the output of the transform into an ingest pipeline. The pipeline should have an enrich processor to match on building and floor and add the max_capacity field. Next I would use a script processor to calculate max_capacity/user_name.keyword.cardinality and write it into a new field.

cecchet · July 28, 2020, 3:54pm

You are correct, I have a timestamp field in my data, it's just the CSV with the max building occupancy info that doesn't have any timestamp (it's not temporal info).

I found the ingest/enrich example at https://www.elastic.co/blog/introducing-the-enrich-processor-for-elasticsearch-ingest-nodes, I think I can make it work. I actually had loaded the CSV using the file data visualizer and copied the data over to the main index, it'll be much cleaner if I can keep the CSV data in a separate index and just enrich the transform in my original index.

Thanks for the help, I'll report back when I get it working!

cecchet · July 28, 2020, 10:13pm

I don't seem to be able to match on multiple fields with enrich, it looks like "match_field" can only be a single field. Do I have to somehow concatenate my building and floor into a single field to be able to match?

Hendrik_Muhs · July 29, 2020, 7:24am

You could do the concatenation in the ingest pipeline by creating a temporary lookup field.

After enrich you can remove it again. That's the easiest way I can think of. The ingest pipeline would therefore be something like:

set lookup field
enrich to add max_capacity
script calculate occupancy
remove lookup field (and maybe other fields you do not want to keep)

cecchet · July 31, 2020, 9:43pm

So I was able to use enrich however it can only match one field so the set lookup on the transform fields didn't help since I didn't have a single matching field on the index used for enrich. I used another unique field to join both and make it work.
I am still fighting with script to make sure it works in all conditions even if some fields are null but I should get there!
Thanks again for the help

system · August 28, 2020, 9:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Calculate P95 with Transforms Elasticsearch transforms	2	753	October 16, 2020
How to aggregate data on elastic sent by logstash Elasticsearch	4	1893	December 4, 2019
Calculations between 2 Kibana hits on the @timestamp parameter Kibana transforms	12	1422	December 29, 2020
Watcher aggregation transform painless question Elasticsearch painless	1	282	November 25, 2022
Data not getting reflected in transform index Elasticsearch transforms	1	296	May 19, 2023

Computing percentages in transforms

Related topics