How split function works in Timelion chart

I am trying to understand how the split function in Timelion chart works.

.es(....,split=rows:256,...)

For example, if I have 10K rows, if I split 10k, Kibana goes timeout right away. When I try split less than 256 rows, it works fine. I am OK with not viewing every points.

So my question is when we split in a number less than the total document numbers in Timelion chart, are each points on the chart reflecting an average of a group of actual points (10k/256 groups)? Or are the 256 points just random points picked from the 10K total rows? Thanks.

Hi the split= operation is a term aggregation where you define the field to use and the size of how many buckets should be returned.
In your case, it looks like you want to split by field called rows and you want to get the top 256 terms on that field. For each term, timelion will create a time series chart, that probably is not what you are looking for.
if you just don't specify the split parameter, timelion aggregate your 10k points depending on your time range and date histogram interval.

Hi Marco, thanks for the reply. When I don't specify the split parameter, the value in rows will not show up. When I set split= 256, it shows 256 points distributed. As you mentioned, these are 256 buckets. I don't understand how each bucket is related to the original 10K data points. Is the original 10k data points first evenly separated to 256 buckets? Then among each bucket, the mean value is calculated or top value is picked?

Hi, can you please share the timelion script you are using and a sample of your document so I can understand what is going on?

Hi Marco, sorry for the slow response. The timelion script I use looks as follow:

.es(index="index_10k",timefield="time",metric="max:volume",split=ID:256).label().color("rgb(128,128,128)").points(show=true,fill=10,fillColor=gray)

I am trying to plot the value in volume field individually as points. I have another field called ID, which is range from 1 to 10,000. I used ID to split the values so that I can see multiple points. If I don't split, I will only see the Max volume value. I am using 256 in split because more than that will cause time-out error. I am trying to understand if these 256 points are a reasonable representation of my whole data set's distribution. How were those 256 points picked?

I hope this make sense. If I describe my data in data frame terms, I have two columns, ID and volume, with 10K rows. Thanks for the help.

Actually Kibana and most visualization are not meant to be used to display single data points, but aggregations of them.
Instead of using Timelion I can suggest using Vega, this tool will allow you to directly fetch the raw data without applying a term aggregation on such level of cardinality.
Rendering a scatterplot there is relatively easy if you follow the current Vega guide Vega | Kibana Guide [7.12] | Elastic and an example here: Scatterplot | Vega-Lite

Thanks Marco. It make sense to me now. I am fine for my use case. But I will take a look at Vega as well.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.