Accessing raw object fields in VEGA-Lite

As the title suggests, I’m having a difficult time accessing data from object fields stored in arrays of objects. They can be queried as objectField.id.keyword and objectField.count for example.

Here is my mapping

"mappings": {
    "properties": {
        "objectField": {
            "properties": {
                "count": { "type": "long" },
                "id": { 
                    "type": "text", 
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                }
            }
        },
        "floatField": { "type": "float" },
        "dateField": { "type": "date" }

Data Examples in the Vega Debug response, which I’m able to see.

"hits": {
    "total": 6,
    "max_score": 1,
    "hits": [
        {
            "_index": "myIndex",
            "_id": "STdC05gBJF3_NGAQp56N",
            "_score": 1,
            "_source": {
              "objectField": [
                {
                  "count": 8442,
                   "id":"867aee015698-0"
                },
                {
                  "count": 4132,
                  "id": "50b91251e329-0"
                },
                {
                  "count": 2672,
                  "id": "9d577be75ea8-0"
                }
              ]

My goal is just to practice accessing the data fields, but I can’t even perform a simple scatterplot as I get “Infinite extent for field” responses.

{
  $schema: https://vega.github.io/schema/vega-lite/v5.json
  title: ObjectSizes

  // Define the data source
  data: {
    url: {
      index: myIndex
      "body": {
        "size": 10, "_source": ["objectField"]
      }
    },
    "format": {"property": "hits.hits"}
  }


  "mark": "point"
  "encoding": {
    "x": {
      "field": "_source['objectField.id']", 
      "type": "ordinal", 
      "axis": {"title": false}
    },
    "y": {
      "field": "_source['objectField.count']",
      "type": "quantitative",
      "axis": {"title": false}
    }
  }
}

I’ve also tried various syntaxes such as _source.objectField.id, the datum variant, and others.

Is there a transform I’m supposed to use? None of the fields need to be flattened or projected I don’t think.

I think the issue here is I’m not formatting the data properly

"format": {"property": "hits.hits[0]._source.objectField"}

Is there a way to do this with a transform so that I can have other fields too?

aka, have "format": {"property": "hits.hits[0]._source”}

Discovered the solution since I was looking at the wrong docs.

  transform: [
    // expose the _source fields
    {
      "calculate": "datum._source.dateField",
      "as": "dateField"
    },
    {
      "calculate": "datum._source.objectField",
      "as": "objectField"
    },
    // flatten the objectField array -> overwrite datum.objectField
    {
      "flatten": ["objectField"],
      "as": ["objectField"]
    },
    // expose the objectField fields
    {
      "calculate": "datum.objectField.id",
      "as": "id"
    },
    {
      "calculate": "datum.objectField.count",
      "as": "count"
    },
  ]
1 Like

Thanks for sharing your solution, @DOH-moodys