Bar length in Horizontal bar graph is differing based on color condition

This is the Code and result without color conditioning

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"width":806
     data: {
    url: {
      %context%: true
      index: index_ssd_2
      body: {
        size: 10000
        _source: ["mediatype","pred_comp_sem"]
        }
      }
      
      format: {property: "hits.hits"}
    },
    "transform": [
   {"filter": "datum._source.mediatype == 'Valid Text'"},
   {"filter": "datum._source.pred_comp_sem != 'Subject Line'"},
   {
    "joinaggregate": [{
      "op": "count",
      "field": "_source.pred_comp_sem",
      "as": "a_count"
    }],
    "groupby": ["_source.pred_comp_sem"]
  }]
   "encoding": {
    "y": {"field": "_source.pred_comp_sem", "type": "nominal", "sort": "-x", "title": null},
    "x": {"field": "a_count","type": "quantitative", "title": null},
  }
  "layer":[{
  "mark": "bar",
},
{
    "mark": {
      "type": "text",
      "fontsize":40,
      "align": "left",
      "dx":5,
      "aria": false
    },
    "encoding": {
      "text": {"field": "a_count", "type": "quantitative"},
      "color": {
        "condition": {
          "test": {"field": "a_count", "gt": 20},
          "value": "white"
        },
        "value": "black"
      }
    }
  }]
}

This the output

Below is the code with bar color conditioning

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"width":806
     data: {
    url: {
      %context%: true
      index: index_ssd_2
      body: {
        size: 10000
        _source: ["mediatype","pred_comp_sem"]
        }
      }
      
      format: {property: "hits.hits"}
    },
    "transform": [
   {"filter": "datum._source.mediatype == 'Valid Text'"},
   {"filter": "datum._source.pred_comp_sem != 'Subject Line'"},
   {
    "joinaggregate": [{
      "op": "count",
      "field": "_source.pred_comp_sem",
      "as": "a_count"
    }],
    "groupby": ["_source.pred_comp_sem"]
  }]
   "encoding": {
    "y": {"field": "_source.pred_comp_sem", "type": "nominal", "sort": "-x", "title": null},
    "x": {"field": "a_count","type": "quantitative", "title": null},
  }
  "layer":[{
  "mark": "bar",
  "encoding": {
      "color": {
        "field": "a_count"
        "type": "quantitative",
        "legend": null,
      }
    }
},
{
    "mark": {
      "type": "text",
      "fontsize":40,
      "align": "left",
      "dx":5,
      "aria": false
    },
    "encoding": {
      "text": {"field": "a_count", "type": "quantitative"},
      "color": {
        "condition": {
          "test": {"field": "a_count", "gt": 20},
          "value": "white"
        },
        "value": "black"
      }
    }
  }]
}


This the output

I don't understand why bar length is differing in this case, the only thing I added is color encoding for bar.

I'm not an expert in Vega, but from what I can see the second graph is not a single bar chart, but a stacked one where each bar is stacked on top of itself an a_count number of times.

I've reproduced the same problem in pure vega here: https://vega.github.io/editor/#/url/vega-lite/N4IgJAzgxgFgpgWwIYgFwhgF0wBwqgegIDc4BzJAOjIEtMYBXAI0poHsDp5kTykBaADZ04JACyUAVhDYA7EABoQAEySYUqUMSSCGcCGgDaoWUgRw0IFEqhsGszGgBMARgC+Ck2YvomikLb2jqguAMweXuaWUP6BDmgAHBEgplHoyrF28agArMmpPiAWNlnB4Z4p3pYAZplBzvlV6GR12e4VBZYwrcHuALoeRbK2yjSyLZogAJ5ooNU0cIIZ6J1KmFM4hbJsCGM6-jIATsEg-AAe-ph0gj6yDIKCgxeT84vLAaWXG4UAjgxIDjoahopEu11u90eg0ESCmcEORlAyEOAGtLEwkAilHBhmxRuNZh9BGwES8Fktop81t9LH8AVd1FdQUobmQccs7g83NyKsi0ZN1ptLJg4GdHEpqnJMABlGgALx8LgADEodDQyPJ0DdquKVM8cqrDjQNNUdBA4IMcSMxhNQCKxYTXhT0HFdYLfv9AYyQRZBrZiaTQLZZKMrnJCSKIME5uT3q7-GRgk4lYNtLpCgB3GAiECpnR6dEwqBo7luAZAA

This example works fine with the data table provided, but as soon as you start adding multiple documents with the same name like:

 "data": {
    "values": [
      {"name": "a", "count": 21},
      {"name": "b", "count": 13},
      {"name": "c", "count": 8},
      {"name": "d", "count": 5},
      {"name": "e", "count": 3},
      {"name": "f", "count": 2},
      {"name": "g", "count": 1},
      {"name": "h", "count": 1},
      {"name": "a", "count": 21},
      {"name": "b", "count": 13},
      {"name": "c", "count": 8},
      {"name": "d", "count": 5},
      {"name": "e", "count": 3},
      {"name": "f", "count": 2},
      {"name": "g", "count": 1},
      {"name": "h", "count": 1}
    ]
  },

you will start noticing the same "stacking" behaviour. The best way to debug this issue is to first debug the current transformed table that you compute inside vega and checking if you are correctly computing that table. I've the doubts that it's not exactly what you expect

I think I've found where is the problem: you are using the joinaggregate count operation, this operation will correctly aggregate your data by _source.pred_comp_sem but it also performs a join operation of the aggregated value with the original dataset. This means that the resulting table is the same table as it comes from the ES query, where each document has also a field called a_count. This basically result into something like

{id: 1, pred_comp_sem:'a', ....., a_count: 3},
{id: 2, pred_comp_sem:'a', ....., a_count: 3},
{id: 3, pred_comp_sem:'a', ....., a_count: 3},
{id: 4, pred_comp_sem:'b', ....., a_count: 2},
{id: 5, pred_comp_sem:'b', ....., a_count: 2},
{id: 6, pred_comp_sem:'c', ....., a_count: 1},

and not the aggregated table you are looking for:

{pred_comp_sem:'a', ....., a_count: 3},
{pred_comp_sem:'b', ....., a_count: 2},
{pred_comp_sem:'c', ....., a_count: 1},

I'm pretty sure that if you change the operation from joinaggregate to aggregate it will fix the issue.

1 Like

Thanks a lot @markov00
it's working

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.