Getting around Kibana's missing support for Nested Objects & Parent/Child


(3ygun) #1

Here is an example of my issue:

I’m a kitchen supplier who wants to visualize the makeup of kitchen's I've sold to customers. I have data in elasticsearch the form of kitchen index which has a type (modern, new, old, etc.) and chairs which is a list of models (a, b,c, d, etc.) & counts (# of each model in the kitchen).

kitchen
|- type
\- chair
   |- model
   \- count

Theoretical data:

kitchen/1 >> { "type": "old", "chair": [ { "model": "a", "count": 4 }, { "model": "b", "count": 2 } ] }
kitchen/2 >> { "type": "old", "chair": [ { "model": "b", "count": 2 }, { "model": "c", "count": 2 } ] }
kitchen/3 >> { "type": "new", "chair": [ { "model": "c", "count": 10 } ] }
kitchen/4 >> { "type": "new", "chair": [ { "model": "a", "count": 1 }, { "model": "b", "count": 1 }, { "model": "c", "count": 1 } ] }

Notice this doesn't have to be how the data is structured

I want to make the Pie Chart of “For kitchen model [old] what are the top 10 chair models listed by
the sum of their counts?”. How would you do this in Kibana?

I'm aware, after much headache, of Kibana's current missing support for nested objects and parent/child.

I've tried:

  • Changing chair to "type": "nested", "include_in_parent": true
    • This results in a incorrect sum when multiple chairs models are present in a single kitchen
  • Tried Parent/Child
    • Cannot include a visualization of both the parent and child documents (The above desired visualization)
  • The Kibana branch with "Full Nested Support"
    • Cannot install: npm install fails as everything is deprecated

Other visualizations that need to work:

  • "Top 5 Kitchen types based on how the sum of many chairs they contain?"
  • "For chair model [a] how many chairs are sitting in kitchens?"

I'm looking for any help as to how to restructure the data to accomplish this. I.E. How would you denormalize this as that seems to be what Kibana wants?

Thank you in advance.


(Matt Bargar) #2

You should be able to create all of those visualizations if you simply pull each chair model out into its own document per kitchen. Like this:

{ "type": "old", "chairModel": "a", "count": 4 }
{ "type": "old", "chairModel": "b", "count": 2 }


(3ygun) #3

Thanks! Three followup questions to make sure my thinking is correct:

  1. Using the denormalized form you listed above I could no longer get a correct graph of the ratio of kitchen types

    • As a single Kitchen of "type": "old" with 2 types of chairs would count twice?
  2. Multiple nesting levels i.e. Chair now has parts with a "chair": [{ ..., "parts": [{"piece":"legs", "color": "green"}, {"piece": "seat", "color": "red"}] }] I would get the following structure

    • { "type": "old", "chairModel": "a", "chairCount": 4, "partPiece": "legs", "partColor": "green" }
      { "type": "old", "chairModel": "a", "chairCount": 4, "partPiece": "seat", "partColor": "red" }
      { "type": "old", "chairModel": "b", "chairCount": 2, "partPiece": "legs", "partColor": "yellow" }
      { "type": "old", "chairModel": "b", "chairCount": 2, "partPiece": "seat", "partColor": "purple" }
  3. Kitchen with 2 nested fields:

    kitchen 
       |- type
       |- chair
       |   |- model
       |   \- count
       \- appliance
           |- model
           \- cost
    

Becomes something like:
{ "type": "old", "chairModel": "a", "chairCount": 4, "applianceModel": "fridge", "applianceCost": 1000 }
{ "type": "old", "chairModel": "a", "chairCount": 4, "applianceModel": "sink", "applianceCost": 100 }
{ "type": "old", "chairModel": "b", "chairCount": 2, "applianceModel": "fridge", "applianceCost": 1000 }
{ "type": "old", "chairModel": "b", "chairCount": 2, "applianceModel": "sink", "applianceCost": 100 }

Basically, this becomes a problem when I look at my real data, a config file, with >1000 fields and 7+ nested things :cry:


(Matt Bargar) #4

1: If I understand correctly, you're talking about a graph with a simple count of each kitchen type? If you have some sort of kitchen ID, you could do a terms aggregation on kitchenType with a Unique Count metric on kitchen ID, so that you essentially get a count of the unique IDs for each kitchen type.

2 & 3: These are difficult to answer without knowing what kind of questions you want to answer about your data. It's possible that you might want multiple indices, depending on how you want to crunch the data, or you might want to throw even more fields into a single document.


(3ygun) #5

Question regarding multiple indices: Does Kibana allow you to create and then aggregate across multiple indices?

Ex: I have 2 indices chairs & appliances within a Kitchen index on Kibana (Sense searches would be kitchen/chairs & kitchen/appliance though process based on similar pattern). I add kitchen ID then have something akin to the following data in ES:

Index chair: { "type": "old", "id": 123, "chairModle": "a", "chairCount": 4 }
Index appliance: { "type": "old", "id": 123, "applianceModel": "sink", "applianceCost": -100 }

From this I would like to produce a Kibana dashboard with the following visualizations:

  • Pie Chart of Kitchen types
    • Ratios based on how many of each type are present in the data (using the unique ID count)
  • Pie Chart of chair types per kitchen type
    • Ratios based on sum of chair counts per chair type for a split chart of Top 5 kitchen types
  • Metrics of average appliance cost for all kitchens

All sample data:
{ "type": "old", "id": 123, "chairModle": "a", "chairCount": 4 }
{ "type": "old", "id": 123, "chairModle": "b", "chairCount": 2 }
{ "type": "new", "id": 245, "chairModle": "b", "chairCount": 10 }
{ "type": "new", "id": 245, "applianceModel": "fridge", "applianceCost": 1000000 }
{ "type": "old", "id": 123, "applianceModel": "sink", "applianceCost": -100 }

Assuming a user then selects to limit the query by selecting "chairModel": "a" in the Chair types per Kitchen type Pie Chart. The Metric of Average Appliance cost fails although the "type": "old" should tell it what to do.

Dashboard views:


Mapping:

Data Upload:


Is there anyway around this or would that require the single index? Thanks for all the help thus far!


(3ygun) #6

The Visualizations Kibana JSON Export:

        [
          {
            "_id": "Kitchen-Average-Appliance-Cost",
            "_type": "visualization",
            "_source": {
              "title": "Kitchen - Average Appliance Cost",
              "visState": "{\"title\":\"New Visualization\",\"type\":\"metric\",\"params\":{\"handleNoResults\":true,\"fontSize\":60},\"aggs\":[{\"id\":\"1\",\"type\":\"avg\",\"schema\":\"metric\",\"params\":{\"field\":\"applianceCost\"}}],\"listeners\":{}}",
              "uiStateJSON": "{}",
              "description": "",
              "version": 1,
              "kibanaSavedObjectMeta": {
                "searchSourceJSON": "{\"index\":\"kitchen\",\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":[]}"
              }
            }
          },
          {
            "_id": "Kitchen-Chair-Type-per-Kitchen-Type",
            "_type": "visualization",
            "_source": {
              "title": "Kitchen - Chair Type per Kitchen Type",
              "visState": "{\"title\":\"New Visualization\",\"type\":\"pie\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":true,\"isDonut\":false},\"aggs\":[{\"id\":\"1\",\"type\":\"sum\",\"schema\":\"metric\",\"params\":{\"field\":\"chairCount\"}},{\"id\":\"2\",\"type\":\"terms\",\"schema\":\"split\",\"params\":{\"field\":\"type\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\",\"row\":false}},{\"id\":\"3\",\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"chairModel\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}],\"listeners\":{}}",
              "uiStateJSON": "{}",
              "description": "",
              "version": 1,
              "kibanaSavedObjectMeta": {
                "searchSourceJSON": "{\"index\":\"kitchen\",\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":[]}"
              }
            }
          },
          {
            "_id": "Kitchen-Kitchen-Type-by-Chair-Sum",
            "_type": "visualization",
            "_source": {
              "title": "Kitchen - Kitchen Type by Chair Sum",
              "visState": "{\"title\":\"New Visualization\",\"type\":\"pie\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":true,\"isDonut\":false},\"aggs\":[{\"id\":\"1\",\"type\":\"sum\",\"schema\":\"metric\",\"params\":{\"field\":\"chairCount\"}},{\"id\":\"2\",\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"type\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}],\"listeners\":{}}",
              "uiStateJSON": "{}",
              "description": "",
              "version": 1,
              "kibanaSavedObjectMeta": {
                "searchSourceJSON": "{\"index\":\"kitchen\",\"query\":{\"query_string\":{\"query\":\"*\",\"analyze_wildcard\":true}},\"filter\":[]}"
              }
            }
          }
        ]

(Matt Bargar) #7

Queries and aggregations across indices are definitely possible. Just make sure fields with the same name have the same mapping across indices/types. A single index can also support quite a few fields.

The problem in this case is that the entire dashboard is being filtered by chairModel:A and none of the docs with appliance costs have chairs, so they're filtered out.

Do you want to be able to correlate chair type with average appliance cost? Or do you want the appliance cost metric to remain the same when filtering on chairs?


(3ygun) #8

I would like to correlate chair type with average appliance cost >> which as you stated requires a single field. Therefore that's what I'll do!

However, as for:

Is there a way to do this or is that not supported? Just interested?

Thanks for all of the help today!


(Matt Bargar) #9

The only way to accomplish that would be to save the filter along with the visualizations that need it, which would lose the interactivity on the dashboard unfortunately. There's a ticket for filters per viz on dashboards here: https://github.com/elastic/kibana/issues/5647


(Nathaniel Anderson) #10

This discussion is very helpful in learning about nesting, Kibana's capabilities, and just Elasticsearch techniques/best practices. Thank you both. I have a few questions:

I'm not sure I understand what you mean here: Why doesn't "id": 123 "tell it what to do"? (In this case, I think "id" and "type" would both tell it to do the same thing because 123 (id) is the only old (type) kitchen (i.e. 1:1 relationship), but that may not always be the case. Do I understand?)

Can you explain how you did this in terms of your mapping or a few sample docs? Do you mean you applied a "chairModle" property to the documents that had "applianceCosts"? If you did, wouldn't that property need to contain multiple values; for example the "sink" (applianceModel) in kitchen 123 (id) is related to two chairModle's; a, and b (?)

I see how you addressed your questions 1 and 3, but this question 2 -- did you address it? Currently I think your model has only 2 levels, kitchen (level 1), appliance and chair (level 2). How would you go deeper; like the chairs 1:* parts (i.e. chairs have many parts) relationship? I think that would be 3 levels; kitchen 1:* chairs 1:* parts.

Thanks again


(3ygun) #11

Hello Nathaniel that's what I tried to make the thread for! Glad it helped someone. To answer your questions:

Question 1:

Although it is true that "id" & "type" would both do the same thing for a single kitchen. If I were to expand the data set to include 1000+ "type": "old" kitchens each with a different "id" I would want to filter by the "type".

Additionally, what I meant by:

Was that because appliance didn't have a mapping for "chairModel" I'd have expected Kibana to do nothing with the filter instead of apply it like:

Question 2 & 3:

Notice I'm still trouble shooting these issues on our data set but I'll attempt to sum up the current plan of attack and will reply later with the final solution I implemented:

Okay, to start off the single field failed horribly as:

  1. Our data-set wouldn't scale when combining every combination (e.g. each data point would have been expanded to >1GB times the number of datapoints :frowning: )
  2. It didn't allow for the metrics &/or dashboard filtering we wanted (e.g. All graphs & charts had the wrong data)

What appears to be the solution:

  • Make an index per nested object and a metrics index. Why?
    • The index per nested object can have each iteration of objects properties. For questions like:
      • PieChart of the Top 5 chairModels with slice size based on the sum of there count
      • BarChart of Average Price vs Appliance Model
    • The metrics index which contains 1 entry per kitchen "id" allows for proper metrics analysis. For questions like:
      • How many kitchens are in our data set? (e.g. Count the number of "id"s)
      • Total money spent on appliances

Example:

If you are parsing in logstash use the split filter to get multiple indexes & entries. And mutate's split for the arrays.

Indexes:

  1. /kitchen-metrics

     kitchen 
        |- id >> Unique id per kitchen
        |- type >> Type of the kitchen
        |- chair
        |   |- model >> [...] Array of all chair.models present in the kitchen
        |   \- count >> [...] Array of all chair.counts present in the kitchen
        \- appliance
            |- model >> [...] Array of all appliance.models present in the kitchen
            \- cost >> [...] Array of all appliance.costs present in the kitchen
    
  2. /kitchen-chair


    kitchen
    |- id >> Unique id per kitchen
    |- type >> Type of the kitchen
    |- chair
    | |- model >> Single model Ex: "A"
    | - count >> Single count Ex: 3
    - appliance
    |- model >> [...] Array of all appliance.models present in the kitchen
    - cost >> [...] Array of all appliance.costs present in the kitchen

  3. /kitchen-appliance


    kitchen
    |- id >> Unique id per kitchen
    |- type >> Type of the kitchen
    |- chair
    | |- model >> [...] Array of all chair.models present in the kitchen
    | - count >> [...] Array of all chair.counts present in the kitchen
    - appliance
    |- model >> Single model Ex: "sink"
    - cost >> Single cost Ex: 1000

This should allow for each graph/chart/metric to be created using its proper index to product the proper chart and allow full dashboard filtering.


(3ygun) #12

Sorry character limit reached...

Question 3:

Still figuring this one out but the expected solution is the solution I've found to Question 2

Conclusion

Is this perfect? Hell no! But it's what we've got to deal with until @Elastic & @Kibana add nested objects


(Nathaniel Anderson) #13

Thanks very much for your reply. I need a few more days to understand it, and I'll ask if questions. Thank you!


(Giovanni Tummarello) #14

If you use the Kibi kept/in/sync Kibana fork you'll be able to structure your information as it is natural: a Kitchen table, a Chair Table , A Customer table if you want. Kibi performs real joins across ES indexes, no need for nested or materialized data. Hope this helps.


(Shay Banon) #15

Please note, myself and we at Elastic don't consider the Join support built on top of Matt Webber pull request (which ended up being copied by Kibi) to be correct implementation, which is the reason we did not pull it in at the end. The set of limitations it imposes ends up contradicting one of the most fundamental concepts we have in Elasticsearch, which is building features that scale.

@jccq I will mention this position in other threads, since there seem to be a flux of comments on several other GitHub issues and threads here, and I would appreciate slowing down on those.


(Fabien Baligand) #16

@3ygun
To answer your first need, you could do that with Elastic Stack, using Parent/Child model.
In Kibana, when you create your visualization, you can copy that in "search bar" :

{
	"has_parent": {
	   "parent_type": "kitchen",
	   "query": {
	      "query_string": {
	         "query": "type:old"
	      }
	   }
	}
} 

And then make a term aggregation based on "model" field, then order by "custom metric", which is sum of 'count' field.
This works for me in Kibana 4.6.3 and Kibana 5.0.1


Display nested rows in Kibana
(Giovanni Tummarello) #17

The original Matt Weber implementation is acknowledged as original inspiration but has since been reimplemented,

Current 4.x version, uses circuit breakers and behaves pretty well also under high load, with the worse thing that can happen is that one gets a * next to a join filter indicating the join was cut at some point.

Typically there is no need for a cut up to 10m join entity sets or so, but if one gets a * then one sees that and can use some more selective filter to get again a complete join. We understand this might not be for everyone of course, therefore understand Elastic/Elasticsearch choice.


(system) #18