Using rank and partition in vega-lite

Hi all,

Can we do something like this in vega-lite or any other way in Kibana to display the count and state of the container?

Select count(*), state
from 
( 
   select container_id, state, rank() over (partition by container_id, sort by actionTime desc) as rn
where rn = 1
)
group by state

example:

container_id, container_state, timestamp
1, loaded, 2pm
1, received, 4pm

here the count of container_id 1 should be only for received state because container is no longer in loaded state

2, received, 5pm
3, loaded , 2pm
4, received, 2pm
4, loaded , 5pm
5, loaded, 3pm

container_state count
Loaded 3
Received 2

so total there are 5 containers out of which 3 are currently in loaded state and 2 are in received state.

You will probably need to express this using an Elasticsearch aggregation query, but I can't help you construct the query based on the information you have given here. Have you read the guide to Vega in Kibana?

The most important thing needed to answer your question is:

  1. Sample documents that you have stored in Elasticsearch
  2. A better description of how you want to group documents

Hi @wylie ,

This is my sample document

{
  "_index": "my_index_2020-11-08",
  "_type": "1",
  "_id": "FgU_qtrotoIo1e_dCEw6m",
  "_version": 1,
  "_score": null,
  "_source": {
    "actionTime": 1604878247872,
    "updateTime": 1604878248650,
    "currDetails_stackingFilter": "texas-N2",
    "currDetails_trailer": null,
    "currDetails_facilityId": "texas",
    "currDetails_location_id": "e2b71f1c-ca05--2af47b3ba2b8",
    "currDetails_location_label": "AB-794",
    "currDetails_location_type": "l",
    "currDetails_location_scannableIds_0": "e2b71f1c-ca05--2af47b3ba2b8",
    "currDetails_state": "Stacked",
    "currDetails_enclosingParent": null,
    "currDetails_container_scannableIds_0": "BAG_SaAPZp_Z",
    "currDetails_container_id": "b6efcef4-065a-ba1d-7c0ff28f13a1",
    "currDetails_container_label": "BAG_SaAPZp_Z",
    "currDetails_container_clientContainerId": "BAG_SaAPZp_Z",
    "currDetails_container_type": "BAG",
    "userLoginId": "hgsgho",
    "ing_timestamp": "2020-11-08T23:44:21.794435",
    "localtime": "2020-11-08T18:30:47.872000"
  },
  "fields": {
    "localtime": [
      "2020-11-08T18:30:47.872Z"
    ],
    "ing_timestamp": [
      "2020-11-08T23:44:21.794Z"
    ]
  },
  "sort": [
    1604879061794
  ]
}

so its a real time data stream so as the container state changes, we get a new document with same format, just with new container_state and new actionTime. So I want to calculate the number of containers per state , but I want to eliminate double counting a container in 2 states, so I just need the most recent state of the container.

Okay so here are your options that I would consider using:

  1. The easiest option is to change the document structure so that you can query it better. Elasticsearch has a continuous transforms feature which is often used for this type of transformation.

  2. Without changing the document structure, you will want to combine several aggregations. There are some limits on the maximum size of the response here, but this approach will work for a lot.

Specifically, here are the aggregations that I would use:

  1. Terms agg on the states
  2. Within each state, Terms agg on the container ID
  3. For the metric you can fetch only the most recent document using the top hits metric. This can get you a single document per container + state

Once you have data in this format you can combine this using Vega into a flattened structure.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.