Getting nested data into proper form for vega

ethrbunny · May 27, 2019, 4:46pm

Am hoping to make charts using vega but am struggling to get data formatted properly. Data in question is usage patterns for postgresql. Specific example:

    "postgresql" : {
        "database" : {
          "rows" : {
            "inserted" : 83977835
          }
        }
      },
      "beat" : {
        "name" : "some_host"
      }

Found this time format snip:

transform: [ {
  calculate: "toDate(datum._source['@timestamp'])"
  as: "time"
} ]

In my attempt at a vega vis I have this for the "url" param:

  index: postgresql*
  body: {
    size: 10000
    _source: ["@timestamp", "beat.name", "postgresql.database.rows.inserted"]
  }

'x' would appear to be

x: {
  field: time
  type: temporal
  axis: {title: false} 
}

It's the 'encoding' section where things go awry. What values should I use for 'y'? I've tried various combinations of _source. to no avail.

mimitsu · May 29, 2019, 7:59pm

An example of how you can access the data from ES is shown in your "transform".

The calculate transform gives the value toDate(datum._source['@timestamp']) to a field (or column header) called time.

Let's break this down. toDate( ) is just a function that creates a date object initialized by whatever value datum._source['@timestamp'] may be.

This datum is a special Vega term. It is the data that you defined in data. In your case, it's the JSON object that Elasticsearch returns from your url parameter.

If you actually run the content of the url parameter, in for example DevTools, you'll get to see what ES returns.

GET postgresql*/_search
{
   "size": 10000,
   "_source": ["@timestamp", "beat.name", "postgresql.database.rows.inserted"]
}

The response should look like this:

{
  "took" :  ...,
  "timed_out" : false,
  "_shards" : { ... },
  "hits" : {
    "total" : ...,
    "max_score" : 1.0,
    "hits" : [ 
       {
           "_index": "postgresql",
           "_type": "_doc",
           "_id": "....",
           "_score": 1.0,
           "_source": {
              "@timestamp": "2019-05-29T04:12:53.318Z",
              "postgresql" : {
                 "database" : {
                    "rows" : {
                       "inserted" : 83977835
                    }
                 }
              },
              "beat" : {
                 "name" : "some_host"
              }
           }
       },
       {
           "_index": "postgresql",
           "_type": "_doc",
           "_id": "....",
           "_score": 1.0,
           "_source": {
              "@timestamp": "2019-05-29T05:20:12.148Z",
              "postgresql" : {
                 "database" : {
                    "rows" : {
                       "inserted" : 12
                    }
                 }
              },
              "beat" : {
                 "name" : "some_other_host"
              }
           }
       },
       ...
    ]
  }
}

In your data definition, you probably also have format: {"property: hits.hits"} somewhere in there. That datum variable is now pointing to an array, exactly the array of your hits.hits:

   [ 
       {
           "_index": "postgresql",
           "_type": "_doc",
           "_id": "....",
           "_score": 1.0,
           "_source": {
              "@timestamp": "2019-05-29T04:12:53.318Z",
              "postgresql" : {
                 "database" : {
                    "rows" : {
                       "inserted" : 83977835
                    }
                 }
              },
              "beat" : {
                 "name" : "some_host"
              }
           }
       },
       {
           "_index": "postgresql",
           "_type": "_doc",
           "_id": "....",
           "_score": 1.0,
           "_source": {
              "@timestamp": "2019-05-29T05:20:12.148Z",
              "postgresql" : {
                 "database" : {
                    "rows" : {
                       "inserted" : 12
                    }
                 }
              },
              "beat" : {
                 "name" : "some_other_host"
              }
           }
       },
       ...
    ]

By saying datum._source you narrow that down to:

[ 
   {
      "@timestamp": "2019-05-29T04:12:53.318Z",
      "postgresql" : {
         "database" : {
            "rows" : {
               "inserted" : 83977835
            }
         }
      },
      "beat" : {
         "name" : "some_host"
      }
   },
   {
      "@timestamp": "2019-05-29T05:20:12.148Z",
      "postgresql" : {
         "database" : {
            "rows" : {
               "inserted" : 12
            }
         }
      },
      "beat" : {
         "name" : "some_other_host"
      }
   },
   ...
]

and further, datum._source['@timestamp'] gives us this array of a bunch of date values:

[ 
   "2019-05-29T04:12:53.318Z",
   "2019-05-29T05:20:12.148Z",
   ...
]

At this point, imagine that Vega has created a table. The table has one column with a column header called time (which is what you called it using calculate). The rows have the values: toDate("2019-05-29T04:12:53.318Z") then toDate("2019-05-29T05:20:12.148Z") etc.

You refer to this column in your encoding by saying that you want your x-axis to use the values from that column time.

You can create another column by adding more to your transform.

transform: [ 
  {
    calculate: "toDate(datum._source['@timestamp'])"
    as: "time"
  },
  {
    calculate: "datum._source.postgresql.database.rows.inserted"
    as: "no_of_rows"
  }
]

This gives you two columns: time and no_of_rows. By the way, the syntax _source['...'] was used instead of the usual dot notation only because the compiler wasn't happy to see the @ character in the dot notation. The work around is to use [' ... '].

So now you can write your encoding using two sets of values.

encoding: {
    x: {
      field: time
      type: temporal
      axis: {title: "Date"}
    }
    y: {
      field: no_of_rows
      type: quantitative
      axis: {title: "Number of rows inserted"}
    }
  }

ethrbunny · May 31, 2019, 8:37pm

This is an amazing writeup. Thank you so much! I will spend time digesting this and see how far I get.

ethrbunny · June 1, 2019, 5:08pm

I believe I have this setup in Vega properly but I'm getting this error:

Cannot read property 'database' of undefined

When I run this query:

GET postgresql*/_search
{
  "size": 10,
  "_source": ["@timestamp", "beat.name","postgresql.database.rows.inserted"]
}

I get data like this:

"hits" : {
"total" : 13763253,
"max_score" : 1.0,
"hits" : [
  {
    "_index" : "postgresql-2019.05.19",
    "_type" : "doc",
    "_id" : "uldjzWoB9FCFqJTdlzd1",
    "_score" : 1.0,
    "_source" : {
      "@timestamp" : "2019-05-19T00:00:16.176Z",
      "beat" : {
        "name" : "postgresql"
      }
    }
  },
  {
    "_index" : "postgresql-2019.05.19",
    "_type" : "doc",
    "_id" : "vldjzWoB9FCFqJTdlzd1",
    "_score" : 1.0,
    "_source" : {
      "@timestamp" : "2019-05-19T00:00:16.176Z",
      "beat" : {
        "name" : "postgresql"
      }
    }
  },
  {
    "_index" : "postgresql-2019.05.19",
    "_type" : "doc",
    "_id" : "qmBjzWoBlRof6N68l855",
    "_score" : 1.0,
    "_source" : {
      "@timestamp" : "2019-05-19T00:00:16.174Z",
      "postgresql" : {
        "database" : {
          "rows" : {
            "inserted" : 0
          }
        }
      },
      "beat" : {
        "name" : "postgresql"
      }
    }
  },

Is it a problem that there appear to be several entries for "beat.name" for one "postgresql..." entry?

mimitsu · June 4, 2019, 4:21am

The error says Cannot read property 'database' which means it could not find a field called database. The database from postgresql.database.

The first document that returns from your GET command is a document that happens to not have a value for postgresql.database.rows.inserted. So that's probably the issue.

One thing you can do is to clean up the data you're sending to Vega by searching for only those documents which have values in that field.

Try this in you url body:

body: {
   size: 10000
   _source: ["@timestamp", "beat.name", "postgresql.database.rows.inserted"]
   query: {
      exists: {
         field: postgresql.database.rows.inserted
      }
   }
}

system · July 2, 2019, 4:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kibana scatter Plot Kibana vega	5	1234	December 31, 2019
Can't Get Timestamp Transform To Work For A Basic Vega Scatter Plot Kibana vega	8	864	March 18, 2021
Vega Array to Data Table Kibana vega	3	687	November 14, 2019
Accessing Elasticsearch from Vega to create custom visualizations Kibana	3	1112	October 22, 2019
Vega: Access to data Kibana	3	738	June 4, 2018

Getting nested data into proper form for vega

Related topics