Vega scatter plot based on array of points in document

I have been working through simple scatter plots, boxplots in Kibana with Vega. Now I'm getting to what I actually need to render.

Each document from elastic has an array of points. From a query, I will likely have up to 10 documents. Ideally what I'm after is an x,y scatter plot for each document and then layer those on top of each other.

Some of the features / docs I have been looking into:

  • Layering
  • Repeating - although this looks like it's one chart per PROPERTY on a document
  • Flatten transform - but this seems to work on single element arrays and makes objects out of them. I already have objects with X, Y in each.

I'm struggling to come up with an approach, or to know if Kibana + Vega can do this.

One layer per document.
Each layer is a scatter plot (data series based on x,y object array in the document)

This is a crude rendering from Excel that hopes to get the point across (3 series are from each of the 3 documents in the sample data below)

image

Here is an example of my data

{
    "values" : [
        {
            "_id" : "3gvR138B1LivDftAs_NA",
            "_score" : 1.0,
            "_source" : {
                "date" : "2022-01-31T18:26:27",
                "points" : [
                    { 
                        "x" : 0,
                        "y" : 100
                    },
                    { 
                        "x" : 1,
                        "y" : 120
                    },
                    { 
                        "x" : 2,
                        "y" : 105
                    },
                    { 
                        "x" : 3,
                        "y" : 108
                    },
                    { 
                        "x" : 4,
                        "y" : 117
                    }
                ]
            }
        },
        {
            "_id" : "3wvR138B1LivDftAs_NA",
            "_score" : 1.0,
            "_source" : {
                "date" : "2022-01-31T18:26:27",
                "points" : [
                    { 
                        "x" : 0,
                        "y" : 98
                    },
                    { 
                        "x" : 1,
                        "y" : 105
                    },
                    { 
                        "x" : 2,
                        "y" : 110
                    },
                    { 
                        "x" : 3,
                        "y" : 115
                    },
                    { 
                        "x" : 4,
                        "y" : 113
                    }
                ]
            }
        },
        {
            "_id" : "4AvR138B1LivDftAs_NA",
            "_score" : 1.0,
            "_source" : {
                "date" : "2022-01-31T18:26:27",
                "points" : [
                    { 
                        "x" : 0,
                        "y" : 115
                    },
                    { 
                        "x" : 1,
                        "y" : 120
                    },
                    { 
                        "x" : 2,
                        "y" : 113
                    },
                    { 
                        "x" : 3,
                        "y" : 122
                    },
                    { 
                        "x" : 4,
                        "y" : 130
                    }
                ]
            }
        }
    ]
}

I do have an ability to transform the JSON into some other format if it will be easier to plot in Vega(-lite) as desired.

Clear as mud? Thank you in advance for any brainstorming thoughts.

I suppose you need Flatten Transform. It is possible with Kibana + Vega. You may use Vega-lite if you prefer.

With flatten transform, you may transform your data into:

"_id": "3w..", "x": 0, "y": 98
"_id": "3w..", "x": 1, "y": 105
"_id": "3w..", "x": 2, "y": 110
...
"_id": "3g..", "x": 0, "y": 100
"_id": "3g..", "x": 1, "y": 120
...

After transform your data, Colored Scatterplot example may help you create your script.

(Your shown plots is not a scatter plots. Isn't it a line plot? Which do you need?)

Thank you for the help @Tomo_M.

I dug a little deeper into Flatten transform - docs only show simple array with integers, so wasn't sure what it would do with the x & y properties in an object. Seems to work great! As you illustrate, I get a repeated copy of the original document for each of the array elements. Allows me to plot the X,Y data and then used nominal encoding to color by document / id.

You are right, I was fluctuating between line chart and scatter plots in my samples. Sorry for the confusion.

Here is my spec in case it helps others (works in online vega editor).

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",

  "width": 600,
  "height" : 200,

  "data": {
    "name": "theData",
    
    "values" : [
        {
            "_id" : "3gvR138B1LivDftAs_NA",
            "_score" : 1.0,
            "_source" : {
                "date" : "2022-01-29T18:26:27",
                "id" : 111,
                "points" : [
                    { 
                        "x" : 0,
                        "y" : 100
                    },
                    { 
                        "x" : 1,
                        "y" : 60
                    },
                    { 
                        "x" : 2,
                        "y" : 105
                    },
                    { 
                        "x" : 3,
                        "y" : 138
                    },
                    { 
                        "x" : 4,
                        "y" : 117
                    }
                ]
            }
        },
        {
            "_id" : "3wvR138B1LivDftAs_NA",
            "_score" : 1.0,
            "_source" : {
                "date" : "2022-01-30T18:26:27",
                "id" : 222,
                "points" : [
                    { 
                        "x" : 0,
                        "y" : 98
                    },
                    { 
                        "x" : 1,
                        "y" : 135
                    },
                    { 
                        "x" : 2,
                        "y" : 110
                    },
                    { 
                        "x" : 3,
                        "y" : 90
                    },
                    { 
                        "x" : 4,
                        "y" : 113
                    }
                ]
            }
        },
        {
            "_id" : "4AvR138B1LivDftAs_NA",
            "_score" : 1.0,
            "_source" : {
                "date" : "2022-01-31T18:26:27",
                "id" : 333,
                "points" : [
                    { 
                        "x" : 0,
                        "y" : 115
                    },
                    { 
                        "x" : 1,
                        "y" : 95
                    },
                    { 
                        "x" : 2,
                        "y" : 110
                    },
                    { 
                        "x" : 3,
                        "y" : 130
                    },
                    { 
                        "x" : 4,
                        "y" : 130
                    }
                ]
            }
        }

    ]

  },
 
  "transform": [
    {
      "flatten": [
        "_source.points"
      ], "as" : [
        "points"
      ]
    }
  ],
  
  "mark": {
    "type": "line",
    "interpolate": "basis"
  },
  
  "encoding": {
  
    "x": {
      "field": "points.x",
      "type": "quantitative",
      "title" : "index"
    },
    
    "y": {
      "field": "points.y", 
      "type": "quantitative",
      "title" : "value"
    },

    "color" : {
      "field" : "_source.id",
      "type" : "nominal",
      "title" : "device id"
    }

  }
}

Here is the output

Layering
I have pretty much given up on the layering idea. Don't think it's possible to create dynamic layer per document, but with the nominal encoding to color them, I don't think I need the layering. I will probably add a filtering mechanism in Kibana dashboard.

Thanks again for the help. I will continue pushing along...

1 Like

Thank you for sharing your specification!

To more accurately test my scenario, I have updated my data to test with, to more correctly match my real data in the Elastic index. My data has two levels of arrays, and I got it all working with "double flatten" transformations - a flatten for each level of the arrays, which generates another document at the root for each of the lines and points in the lines. Thought I would post the working spec in case it helps someone else with this data layout.

I'm using inline data for now, to have this working in the Vega Editor.

Outer array represents a line series. Inner array represents the points of the line. Use calculate to compute a line ID and use that to group & color them with nominal encoding type.

Resulting graph is shown here:

Hope that helps someone out there...

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.