Data Table adding certain field causes strange behavior

davidv · December 31, 2017, 12:52pm

Hi,

I have this table:

When I split the rows again with layers.elem.bandwidth all layers receive all bandwidths (instead of just adding a column) also, all rows are being multiplied by the amount of layers. See partial image:

Adding data doc example:

> {
>   "_index": "device_profile",
>   "_type": "doc",
>   "_id": "s4x0nGAB9EBpl5JA7kg2",
>   "_version": 1,
>   "_score": 3.7429986,
>   "_source": {
>     "layers": {
>       "elem": [
>         {
>           "bandwidth": "1992294",
>           "-id": "0",
>           "enc_id": "131332",
>           "name": "1.9"
>         },
>         {
>           "bandwidth": "1677721",
>           "-id": "1",
>           "enc_id": "108209",
>           "name": "1.6"
>         },
>         {
>           "bandwidth": "2097152",
>           "-id": "2",
>           "enc_id": "131333",
>           "name": "2.0"
>         },
>         {
>           "bandwidth": "2097152",
>           "-id": "3",
>           "enc_id": "5",
>           "name": "2.1"
>         },
>         {
>           "bandwidth": "2516582",
>           "-id": "4",
>           "enc_id": "132434",
>           "name": "2.4"
>         },
>         {
>           "bandwidth": "2998927",
>           "-id": "5",
>           "enc_id": "1",
>           "name": "3.0"
>         },
>         {
>           "bandwidth": "3565158",
>           "-id": "6",
>           "enc_id": "132433",
>           "name": "3.4"
>         },
>         {
>           "bandwidth": "3670016",
>           "-id": "7",
>           "enc_id": "6",
>           "name": "3.5"
>         }
>       ],
>       "size": "8"
>     },
>     "name": "HLS2",
>     "audio_profile": {
>       "bandwidth": "0",
>       "name": "AUDIO_PASSTHROUGH",
>     },
>     "fragment_length": "2",
>     "live_sliding_window": "3"
>   }
>   }
> } }

Thanks!

Joe_Fleming · January 2, 2018, 5:20pm

This behavior is actually expected. It has to do with how you are asking Elasticsearch to aggregate the data you have. The order of the aggregations matters, as the data is chunked up in that order.

As an example, if you're splitting on "Layer Name", and then splitting again on "Layer Bandwidth", you'll see many rows with the same "Layer Name".

I suspect what you're actually looking for here is a "rolled up" value for the Bandwidth, like the average or sum, for example. In that case, you'll want to add that field as a metric, not as an aggregation. What you are asking for in your example is what you got; all the sparse values for bandwidth represented as individual rows in the table.

Instead, remove that aggregation, and add a new metric, using, for example, the average of the "Layer Bandwidth" field (or whatever calculation you actually want to see, you can add multiples too, like sum, average, min, max...).

davidv · January 3, 2018, 8:35am

I would like to display each layer with it's bandwidth - not an average of all layers nor any other metric.
How can I do that without getting all the sparse values as shown in the image?

David

Joe_Fleming · January 3, 2018, 4:06pm

I would like to display each layer with it's bandwidth

If you want one value per layer, you have to roll up that value. Otherwise, what would you show as a single value for a given layer?

Maybe this is a communication breakdown though... can you give me an example of what you were hoping to see?

davidv · January 3, 2018, 4:18pm

Hi @Joe_Fleming ,

Attached. I guess I need to roll up the value, what would be the best way to do it.

Appreciate it.

Joe_Fleming · January 3, 2018, 5:16pm

I guess I need to roll up the value, what would be the best way to do it.

It depends what you want to see. You're going to take a collection of values and distill them into a single value, what value makes sense in your use case is kind of variable, but if I had to guess, average or max would probably be useful for you. Keep in mind you can have multiple metrics, so you could show min, max, and average, for example. But it's your data, and you know what you want to see better than I do. The key is to remove the bandwidth aggregation, or "Bucket" in Kibana lingo, and add one or more "Metric" values for the bandwidth field.

davidv · January 4, 2018, 7:36am

Hi @Joe_Fleming,

Sorry for not being clear enough.
Each name has few layers and each layer has it's bw, I don't want to display a metric of the bw value, I just want to display each layer with it's bw, Please see my example doc attached in the original post.
Name 1:N Layer 1:1 BW

Thank you.

Joe_Fleming · January 9, 2018, 10:57pm

You want to show some "bw" value for each unique "layer" value, right? If I'm getting that right, the instructions I gave you before were right. Basically, do everything you originally did except for the bandwidth aggregation/bucket, and add one or more "Metric" values for the bandwidth field. Basically, that very first table you posted here was correct, it just needs the bandwidth added as a metric.

davidv · January 10, 2018, 7:45am

I tried adding "bw" as a metric instead of a bucket, but then I get the same value (the metric - in the attached image used avg for example) for all layers in oppose to the fact that I wanted each layer to display it original "bw" value. every layer name has a unique layer bw.

Joe_Fleming · January 10, 2018, 6:49pm

Can you expand all the options in the visualization sidebar in Kibana and post some screenshots here? I'd like to see how you're building this... I wonder if it's an order issue. Those values should be different, unless their average value is actually the same (which seems unlikely).

davidv · January 11, 2018, 7:22am

Attached:
Metrics

Buckets

It's an average wouldn't it be the same for all?...

Joe_Fleming · January 11, 2018, 4:18pm

It's an average wouldn't it be the same for all?

No, it should show the average for each bucket. What you've done is aggregate your data, splitting it up by the name and then by the layer (I'm assuming, the image doesn't show past the first bucket there...), so no you have a row for each individual pair of name and layer. For each row, you're asking for the average value of layers.elem.bandwidth, so it will give you the average value for every document that represents each row/bucket pair.

What I can see in that image looks right, but can you show me all the bucket configurations under that first name.keyword one?

davidv · January 14, 2018, 1:46pm

Hi @Joe_Fleming.
The image shows all data, I deleted all other data to focus in that only.
I attached the document in the original post, it is one document only.
I think I've added all the buckets configuration (maybe you haven't pressed on the image to see to full content)?

Joe_Fleming · January 18, 2018, 10:22pm

maybe you haven't pressed on the image to see to full content

Indeed, I didn't realize the image was truncated. Thanks for pointing that out.

Here's an example of the same idea, albeit with different data. As you can see, each collection, grouped by user agent and then by file extension, shows an average byte value. This should be what you see as well.

I missed earlier that you are dealing with a single document. That's actually not how you'll want to index your data in elasticsearch, as it doesn't break up the values on a single document. The easiest solution is to index each one of the items in layers.elem as a separate document, denormalizing the data to include the other information in the main document, like name and audio_profile, etc.

If you're trying to track these values over time, say as part of a compilation process, you'll also want to add a time field to the individual documents, so that it can group them up by time. If this isn't time-series data, you'll want to have some other unique value to group on. Maybe name is that field, maybe not, I don't know your data well enough to say.

Your data might end up looking like this when you index it, each item being a new document:

{
  "name": "HLS2",
  "audio_profile": {
    "bandwidth": "0",
    "name": "AUDIO_PASSTHROUGH",
  },
  "fragment_length": "2",
  "live_sliding_window": "3",
  "bandwidth": "1992294",
  "-id": "0",
  "enc_id": "131332",
  "el_name": "1.9"
}

{
  "name": "HLS2",
  "audio_profile": {
    "bandwidth": "0",
    "name": "AUDIO_PASSTHROUGH",
  },
  "fragment_length": "2",
  "live_sliding_window": "3",
  "bandwidth": "1677721",
  "-id": "1",
  "enc_id": "108209",
  "el_name": "1.6"  
}

...

This will allow you to group by name, then el_name, and then see the average bandwidth for each group.

davidv · January 21, 2018, 12:29pm

Thanks @Joe_Fleming, I guess this option fits the most. That's what I'll do.

system · February 18, 2018, 12:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.