Computing percentages in transforms

I have a transform that calculates hourly stats of users on each floor of each building:

{
  "id": "building-floor",
  "source": {
    "index": [
      "anindex"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "building-floor"
  },
  "sync": {
    "time": {
      "field": "@timestamp",
      "delay": "60s"
    }
  },
  "pivot": {
    "group_by": {
      "@timestamp": {
        "date_histogram": {
          "field": "@timestamp",
          "calendar_interval": "1h"
        }
      },
      "building.keyword": {
        "terms": {
          "field": "building.keyword"
        }
      },
      "floor.keyword": {
        "terms": {
          "field": "floor.keyword"
        }
      }
    },
    "aggregations": {
      "user_name.keyword.cardinality": {
        "cardinality": {
          "field": "user_name.keyword"
        }
      }
    }
  },
  "description": "Stats per building per floor",
  "settings": {},
  "version": "7.8.0",
  "create_time": 1595890114249
}

Now I have a list of building occupancy from a CSV file that has: building,floor,max_capacity
I loaded this file into the index but of course there is no timestamp for these entries:

{
  "_index" : "anindex",
  "_type" : "_doc",
  "_id" : "aoREk3MBaS8HmMr3F3v_",
  "_score" : 1.0,
  "_source" : {
    "name" : "Main building",
    "floor" : "1",
    "max_capacity" : 20,
    "building" : "MAIN"
  }
}

What I am trying to do is compute the occupancy percentage (basically max_capacity/user_name.keyword.cardinality) for each floor of each building. Ideally I'd like to do that in the transform but even if there is a simple to do that on the fly in a visualizer, I'd be good with it.

I can't seem to be able to achieve what would be a very simple join in SQL because I don't have a timestamp field in my building max occupancy data. For pretty much everything else I was able to denormalize the data but here I wouldn't even know how to denormalize it if I wanted to add max_capacity to every entry in the index based on building/floor.

Any guidance on what options I should look into to solve this problem is greatly appreciated.

Hi,

if you want hourly statistics, you should have a timestamp field in your data, don't you? But maybe I do not understand it. Does all of your data miss the timestamp? The source you posted does not contain user_name either.

I can only guess that your real data has a timestamp and the user_name and with the CSV data you try to enrich the results?

If so, enrich is what you look for. You can feed the output of the transform into an ingest pipeline. The pipeline should have an enrich processor to match on building and floor and add the max_capacity field. Next I would use a script processor to calculate max_capacity/user_name.keyword.cardinality and write it into a new field.

You are correct, I have a timestamp field in my data, it's just the CSV with the max building occupancy info that doesn't have any timestamp (it's not temporal info).

I found the ingest/enrich example at https://www.elastic.co/blog/introducing-the-enrich-processor-for-elasticsearch-ingest-nodes, I think I can make it work. I actually had loaded the CSV using the file data visualizer and copied the data over to the main index, it'll be much cleaner if I can keep the CSV data in a separate index and just enrich the transform in my original index.

Thanks for the help, I'll report back when I get it working!

I don't seem to be able to match on multiple fields with enrich, it looks like "match_field" can only be a single field. Do I have to somehow concatenate my building and floor into a single field to be able to match?

You could do the concatenation in the ingest pipeline by creating a temporary lookup field.

After enrich you can remove it again. That's the easiest way I can think of. The ingest pipeline would therefore be something like:

  • set lookup field
  • enrich to add max_capacity
  • script calculate occupancy
  • remove lookup field (and maybe other fields you do not want to keep)

So I was able to use enrich however it can only match one field so the set lookup on the transform fields didn't help since I didn't have a single matching field on the index used for enrich. I used another unique field to join both and make it work.
I am still fighting with script to make sure it works in all conditions even if some fields are null but I should get there!
Thanks again for the help

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.