Filter the Nth serial numbers with highest count

Hi all!

I am trying to do a scatter plot with Vega-Lite that shows the evolution of capacity of some batteries with time. The colors in the plot should correspond to the different batteries (serial number). Find below a simplified version of the code.

Since we have a large number of batteries, I would like to show only the 20 with most charges (times that they have been plugged into the charger). I only managed to filter those with less than 4 charges, but that means that some serial numbers are removed 'forever' and cannot be found even filtering in the upper bar context filter.

What transform operation could I do to count the number of charges of each battery, sort them, and keep the 20 highest?

Thanks a lot
Rober

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "url": {
      %context%: true
      "index": "statistics-battery",
      "body": {
        "size": 10000,
        "_source": ["boot_utc_date", "full_charge_capacity_mAh", "battery_serial_number", "cycle_count"]
      },
      "format": {"property": "hits.hits"}
    },
    "format": {"property": "hits.hits"}
  },
  "transform": [
    // Filter to include only batteries with more than 3 logs
    {
      "window": [{"op": "count", "as": "count"}],
      "groupby": ["_source.battery_serial_number"]
    },
    {"filter": {"field": "count", "gte": 4}},
  ],
  "layer": [
    {
    "mark": {"type": "point", "filled": true, size: 60},
    "title": "Full Charge Capacity (mAh) for each battery during time",
    "encoding": {
      "x": {
        "field": "_source.boot_utc_date",
        "type": "temporal",
        "axis": {"title": "Date"}
      },
      "y": {
        "field": "_source.full_charge_capacity_mAh",
        "type": "quantitative",
        "scale": {"domain": [2500, 4800]},
        "axis": {"title": "Full Charge Capacity (mAh)"},
  
      },
      "color": {
        "field": "_source.battery_serial_number",
        "type": "nominal",
        "scale": {"scheme": "category20"},
        "legend": {"title": "Battery Serial Number"}
      },
    }
  }
  ],
}

I think this can be also done in Lens, but if you want to continue with Vega, the sorting should be done in the query for Elasticsearch similar to this example: Vega | Kibana Guide [8.12] | Elastic
So I would not filter, i would ask for the top hits, but add pagination to the query result and you can only use the first page.

Thanks for your reply. I tried including it with the query{} keyword but since I have included %context%: true, Vega-Lite complains.

In the transform I think I am not very far from the desired behaviour with the code below. However, if I set a filter on the context bar, the transform is not recalculated it seems. In other words, it looks as if the transform (and therefore the filter for the N first serial numbers) is first applied, and then the context filter is applied, so there are for example some serial numbers that I cannot retrieve. Is this the normal behaviour?

  "transform": [
   {
      "joinaggregate": [{"op": "count", "as": "count"}],
      "groupby": ["_source.battery_serial_number"]
    },
   {
      "window": [{"op": "dense_rank", "as": "rank"}],
      "frame": [null, null],
      "sort": [{ "field": "count", "order": "descending" }]
    },
    {"filter": {"field": "rank", "lte": 20}}
  ],

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.