Vega: Access nested fields of a JSON file

Consider the below response for an Elasticsearch query.

"aggregations" : {
    "terms" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1",
          "doc_count" : 3,
          "top_status_hits" : {
            "hits" : {
              "total" : {
                "value" : 3,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "demo-index",
                  "_type" : "_doc",
                  "_id" : "2L2JR3YBsgdxC8Q2whgP",
                  "_score" : null,
                  "_source" : {
                    "Status" : "end",
                    "ID" : "1"
                  },
                  "sort" : [
                    "end"
                  ]
                }
              ]
            }
          }
        }
      ]
    }
}

How can I access the Status and ID fields in a Vega code? I tried doing it like below without any success.

  "format": {"property": "aggregations.terms.buckets"},
  "transform": [
    {
      "type": "flatten",
      "fields": ["top_status_hits.hits.hits"],
      "as": ["data"]
    },
    {"type": "pie", "field": "data._source.Status"}
  ]

Any suggestions?

Untested. Try

"format": {"property": "aggregations.terms.buckets. top_status_hits.hit.hits"},

Then use the below to access them

_source.Status
_source.ID

Hi @aaron-nimocks, thanks for the reply.

I tried what you said, but, it gives the following error.

hits.hits perhaps

Nope, not working. :frowning_face:

This one is special because you have an array within an array. Not sure I've done that before.

Using the Vega Editor I was getting close but couldn't get inside that 2nd array yet. If I have time I will come back to it later today.

Possible to change your query to only return a single array or to narrow down the results you need?

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "data": [
    {
      "name": "table",
      "values": {
        "aggregations": {
          "terms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "1",
                "doc_count": 3,
                "top_status_hits": {
                  "hits": {
                    "total": {"value": 3, "relation": "eq"},
                    "max_score": null,
                    "hits": [
                      {
                        "_index": "demo-index",
                        "_type": "_doc",
                        "_id": "2L2JR3YBsgdxC8Q2whgP",
                        "_score": null,
                        "_source": {"Status": "end", "ID": "1"},
                        "sort": ["end"]
                      }
                    ]
                  }
                }
              }
            ]
          }
        }
      },
      "format": {"property": "aggregations.terms.buckets"},
      "transform": [
        {
          "type": "formula",
          "as": "data",
          "expr": "datum.top_status_hits.hits.hits"
        }
      ]
    }
  ],
  "marks": [
    {
      "type": "text",
      "from": {"data": "table"},
      "encode": {
        "enter": {
          "fill": {"value": "#000"},
          "text": {"field": "doc_count"},
          "x": {"value": 10},
          "y": {"value": 10}
        }
      }
    }
  ]
}

Sure @aaron-nimocks. It'll be really helpful if you could find a solution. Meanwhile, I'll show what I wanted to achieve.

What I did to get this result was by using a top_hits metric aggregator on the index. Here's the data I used.

1,start
1,pending
1,end
2,start
2,end
3,start
3,pending
3,end
4,start
4,pending
5,start
5,pending
5,end
6,start
7,start
8,start

And here's the Elasticsearch API request I used for achieving the response I mentioned earlier.

GET /demo*/_search
{
  "aggs": {
    "terms": {
      "terms": {
        "field": "ID.keyword",
        "size": 10
      },
      "aggs": {
        "top_status_hits": {
          "top_hits": {
            "sort": [
              {
                "Status.keyword": {
                  "order": "asc"
                }
              }
            ],
            "_source": {
              "includes": [
                "ID",
                "Status"
              ]
            },
            "size": 1
          }
        }
      }
    }
  }
}

I wanted to get the last status for each ID from the index.

Try

"format": {"property": "aggregations.terms.buckets[0].top_status_hits.hits.hits"}

and for displaying in marks use

"text": {"field": "_source.Status"},

2 Likes

Woah, it's working!!! :tada: :tada: :tada:

Thank you so, so much @aaron-nimocks!!! I was so lost for whole 2 days with this. Finally it's solved! What a relief! :heart_eyes:

Thanks once again for showing the path! :grin:

1 Like

Awesome!

At first I was trying to dynamically parse the buckets array. After thinking about it there should only always be just 1 with that query.

Adding the [0] in aggregations.terms.buckets[0].top_status_hits.hits.hits just means use the first result and only first result. So really you just walk down the JSON path in order to get to the data you want and end at an array which is the 2nd hits.

Just adding more detail in case someone else runs across this and needs it.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.