Vega Visualization query

It might be that I am not using the plugin in the intended way. Essentially I am trying to plot the data which is returned by the query from the Elasticsearch. My pet peeve was my inability (or ignorance!!) to plot the data straight out of the results of an Elasticsearch query. I thought that Vega can help since I can sneak in my ES query and then plot/do stats on results. As you can see there are no aggs involved in the ES query. Timepicker and filter context will not work as per my understanding.

Here is my vega schema:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "title": "Test",
  "data": {
    "url": {
      "index": "log-demo-2019.11",
      "body": {
        "size": 1000,
        "_source": ["@timestamp", "_score", "fields.StepNumber", "fields.Reading.Value"],
        "query": {
          "bool": {
            "should": [
              {"match": {"fields.StepNumber": "PX_Run_1"}},
              {"match": {"fields.StepNumber": "PX_Run_2"}},
              {"match": {"fields.StepNumber": "PX_Run_3"}},
              {"match": {"fields.StepNumber": "PX_Run_4"}},
              {"match": {"fields.StepNumber": "PX_Run_5"}},
              {"match": {"fields.StepNumber": "PX_Run_6"}}
            ],
            "minimum_should_match": 1
          }
        }
      }
    },
    "format": {"property": "hits.hits"}
  },
  "mark": "point",
  
  "encoding": 
  {
    "tooltip": {"field": "_source.fields.Reading.Value", "type": "quantitative"},
    "x": 
      {
      "field": "_source.fields.StepNumber", 
      "type": "nominal",
      "axis": {"title": "Dev Runs"},
      }
    "y": 
      {
      aggregate":"average",
      "field": "_source.fields.Reading.Value",
      "type": "quantitative",
      "axis": {"title": "Perf"}
      }
      
  }
}

I think Vega is seeing and plotting each result. Like the 180 hits I got for this.

I was expecting just 6 points, each denoting the average for each of the runs. But the experiment failed.
Is there any way I can make Vega see the result as a whole and plot it right?

Hi @pk.241011,

As you can see there are no aggs involved in the ES query.

Is there a specific reason you are doing this? By using a query with a terms aggregation and a nested average aggregation you can make Elasticsearch do the averaging. This is especially helpful if you have lots of data (billions of records). If you are doing the averaging in vega, all of the raw data has to be streamed to your browser (which could result in gigabytes of data downloaded each time you view the visualization). It's always recommended to do as much of the processing as possible in Elasticsearch.

In your case it could look like this:

 "aggs" : {
        "step" : {
            "terms" : { "field" : "fields.StepNumber" },
            "aggs" : {
                "perf" : { "avg" : { "field" : "fields.Reading.Value" } }
            }
        }
    }

Timepicker and filter context will not work as per my understanding.

You can make them work by including these "magic" properties in the data url:

      // Apply dashboard context filters when set
      "%context%": true,
      // Filter the time picker (upper right corner) with this field
      "%timefield%": @timestamp

All of that being said, if you still want to do the averaging in vega, it looks like you are just missing a quote in front of the aggregate key:

"y": 
      {
      "aggregate":"average",
      "field": "_source.fields.Reading.Value",
      "type": "quantitative",
      "axis": {"title": "Perf"}
      }

should work as expected.

Thanks for reply !!! The original reason was to get the median which was not natively supported out of ES. But then I found out that ES does provide the percentile aggregations and the 50% is same (almost) as the median. That solved my case. I will check the solution you have provided to see if it works in Vega. Will be intresting thing to know and keep at hand in case there are some weird stats are asked for which are not supported yet in ES.

You can also look into the scripted metric aggregation. It allows you to specify map/reduce scripts in painless which are run parallelized across your cluster.

Doing the calculation on the client doesn't scale so it will only work for small data sets, that's something important to keep in mind. There are limits on how many documents can be fetched at once by vega (I think it's 10000 documents in the default setting).