Noob help with Kibana, Mappings & Nested Objects in Arrays

Hi,

I'm investigating using Kibana and giving this to our customers as an analytics tool for the data our app produces. Currently I have data that looks like this:

Github GIST - data.json

I am using the NEST .NET Client to put the data in ES and I create a Mapping like so:

Github GIST - C# Mapping

The raw data from /stats/_mappings is:
Github GIST - Mapping JSON

Whilst in Kibana I am trying to do a simple bar graph over time that does a sum of mcus.status. audioBitRateIncoming. As you'll see mcus is an array of objects, status is property object under mcus.

08

Essentially when I create the graph nothing is plotted. Is this because I have my mapping wrong or is it simply Kibana cannot handle these nested objects?

Anyone?

In the index mapping,

          "mcus": {
            "type": "nested",
            "properties": {
              "alarms": {
                "type": "object"
              },

That is the right mapping to use, from the Elasticsearch perspective. Unfortunately Kibana does not have support right now for the nested data type. See https://github.com/elastic/kibana/issues/1084

Thanks

Would an alternative approach be to denormalize and for each mcu object
create a document with the same timestamp but with the individual property
values set. That way I can aggregate on them and do filtering but obviously
that introduces duplication for the parent data

Generally, denormalizing is the way to go, and how you denormalize depends on what kind of analysis you want to do. It would work to create multiple documents for each MCU/Stat and it is up to you how many fields you would want to duplicate from the original stat document. Internally, Elasticsearch uses compression when storing the data, and duplicated data compresses well.

A method that has worked for me before was to denormalize nested data by having an array field in my "parent" document for each field in a nested type. For example,

I had an orders.product_details field which was nested type, and that object had product_name (string), product_aisle (string), and a product_reordered (boolean) which were details for each product in the order. I indexed the data with extra fields: orders.product_names (array string), orders.product_names (array string), and orders.reorder_count (number). That allowed me to do some simple analysis on the data.

In my case, it was acceptable to lose the linking between some of the inner data fields. If I wanted to answer a question like, "how many times do people reorder yogurt?" I would have to calculate a field for yogurt reorders when I re-index the data.

Hope that helps!

My only concern with what I have at the moment which is a complete
duplication of the stats data with the mcu data on that document is if
someone does a sum of X on the stats data it will obviously, in this case
be multiplied by 3 (because I have 3 mcus in my nested data).

In your product example how did you tie up the reorder count for product A
for example if you had an array of reorder counts, shouldn't that be tied
to a product name id somehow?

UPDATE: Ah I see you said you lost that linked data and you would have to then change the data if you wanted the id linking. Guess I need to predict what analysis customers might want to do with our MCU data.

That is pretty much true, your best bet would be to talk to the customers and find out what they're interested in.

In this case, if that sum is important, then you'll want to add a field for X at a higher level that encompasses the relevant X values, and the customers can do a sum aggregation on that field. That would have to be handled at index time in the data pipeline.

I'm a bit confused on this one. If I have a doc with X =1 and has a MCU1 denormalized data here then a 2nd doc with X =1 and a third doc with X=1 assuming all docs have the same timestamp as that's what we've derived then summing X will be 3 for that timestamp but actually its only 1 because we have duplicated data. Or when denormalizing do we try and keep one document for that timestamp?

Thanks for the help so far

Yes, when denormalizing, probably the easiest way to go would be to keep a single document, and add additional calculated fields. When analyzing the data, the calculated fields will be used, not the inner nested fields.

OK that makes a bit more sense I guess so roughly do we want to get to something like this

{
"id":1,
"foo":"bar",
"mcus_that_wont_be_used_or_maybe_we_dont_have_this_nested_array:
[
  {
    "id":2,
    "name":"core1",
    "audiotx":100
    "videotx":10
  },
  {
    "id":3,
    "name":"core2",
    "audiotx":100
    "videotx":10
  },
  {
    "id":3,
    "name":"core3",
    "audiotx":100
    "videotx":10
  }
],
"mcu1audiotx":100,
"mcu1videotx":10,
"mcu2audiotx":100,
"mcu2videotx":10,
"mcu3audiotx":100,
"mcu3videotx":10,
"mcuaudiotxsum_or_can_kibana_sum_separate_fields:300,
"mcuvideotxsum_or_can_kibana_sum_separate_fields:30
}

So we move the values we are interested in as top level properties that are created somehow possibly by the .NET client or some other ES magic and then create aggregated values as well or does kibana aggregate over multiple fields?

Thanks again and sorry for the noob questions

Right. If it is simple to handle the data manipulation in the .NET client, then you could do it that way. Otherwise, you can look into Ingest Node: Ingest pipelines | Elasticsearch Guide [master] | Elastic which lets you define data pipeline in Elasticsearch that allows you to do some preprocessing on the data before it gets indexed.

Thanks.

Can Kibana aggregate over multiple fields or ie sum(mcu1videotx,mcu2videotx,mcu3videotx) or should I get the client/ingestnode to create that as a field on index?

I would go with creating that as a field on index. You can sum multiple fields with a scripted field in Kibana, but you can't search or aggregate on a scripted field.

Also, if you are able to loop though all the MCU's that are related to a document before indexing, then it doesn't matter how many MCU's there are in the relationship -- you don't have to assume there will always be 3.

Yup I was just using 3 as an example. My only concern with adding the
useful properties and the sum for example is that I will have to add other
aggregations like count,bag,min,max and kind of guess what users might want
to do with the data.

That is generally correct. It would probably benefit you to run some interviews with the users and find out what they're most interested in.

Cheers!

I've been playing with POST _ingest/pipeline/_simulate however I can't seem to create unique fields on the parent object. For example I can use the foreach processor and the set processor but I can't obviously create fields that are unique per iteration of the foreach statement eg/cant create fields mcu1, mcu2 etc The below probably gets executed 3 times but I end up with 1 property

"foreach": {
          "field": "mcus",
          "processor": {

            "set": {
              "field": "how_can_this_be_unique_per_iteration",
              "value": "_value"
            }
          }
        }

Even trying to put mcu name into an array using the append processor just ends up creating 3 items in a field with values _ingest._value.id not the actual mcu name value.

"foreach": {
          "field": "mcus",
          "processor": {
            "append": {
              "field": "field23",
              "value": "_ingest._value.id"
            }
          }
        }

I would probably look into using the script processor for this. If you have more questions about setting up this pipeline, opening a new Discuss thread under the Elasticsearch topic would probably get you the best help :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.