Kibana virtualization question - nested documents won't aggregate

Why can't I aggregate studies in this data? I'm ommitting other data which seems find to aggregate, but something in this dict / list I can not aggregate. I can search through the data with kquery, but I can't create charts etc.

E.g. how many studies per account

If your mapping uses the nested type, this behavior is expected. In your case (if you want to create aggregations that need to treat each nested object like a separate document) it might make sense to denormalize your data and save the individual studies as individual documents in a separate index.

If I was pushing the 'account' data to one index and 'studies' to another index, how would I be able to aggregate / search into one visualization?

You wouldn't for most visualization types, but you could inline the data of the associated account into each of the study documents.

What's the difference of inling account information into a study and inlining the study into the doc.account.

Due to data privacy I'm not able to share more info in the document, but the structure is more relevant to:

{
  "title": "Nest eggs",
  "body": "Making your money work...",
  "tags": [
    "cash",
    "shares"
  ],
  "comments": [
    {
      "name": "John Smith",
      "comment": "Great article",
      "age": 28,
      "stars": 4,
      "date": "2014-09-01",
      "studies": [
        {
          "account_id": 6125,
          "study_id": 6572436,
          "num_instances": 55,
          "file_size_in_bytes": 3948805,
          "created_at": "2019-12-03T11:47:17.694Z",
          "customer_id": 2465,
          "uplink_device_id": 57098,
          "num_cines": 0
        }
      ]
    },
    {
      "name": "Alice White",
      "comment": "More like this please",
      "age": 31,
      "stars": 5,
      "date": "2014-10-22"
    }
  ]
}

Using kquery I can query the data comments.studies.study_id:6572436 (would return a result). I can't use studies to aggregate, but I can use comments.date etc

What's the difference of inling account information into a study and inlining the study into the doc.account.

The difference is that there is just one account per study document. To aggregate correctly in Kibana you have to provide the thing you want to aggregate over as a single document (without an array).

Currently your data looks like this:

{
  key1: 1,
  nested: [
    { nestedKey1: 2, nestedKey2: 3,  },
    { nestedKey1: 4, nestedKey2: 5,  },
    { nestedKey1: 6, nestedKey2: 7  },
  ]
}

If you want to do aggregation on the nested keys, you should instead create three documents like this:

{ key1: 1, nestedKey1: 2, nestedKey2: 3 },
{ key1: 1, nestedKey1: 4, nestedKey2: 5 },
{ key1: 1, nestedKey1: 6, nestedKey2: 7 }

Using kquery I can query the data comments.studies.study_id:6572436 (would return a result). I can't use studies to aggregate, but I can use comments.date etc

If you look into your mapping, "comments" is probably not using a nested type, but "studies" is. You can change your mapping and not make "studies" nested, but that would probably not what you want, because in this case Elasticsearch is flattening the data in the array and the association between the values in the nested object is lost. E.g.

{
  key1: 1,
  nested: [
    { nestedKey1: 2, nestedKey2: 3,  },
    { nestedKey1: 4, nestedKey2: 5,  },
    { nestedKey1: 6, nestedKey2: 7  },
  ]
}

becomes

{
  key1: 1,
  nested.nestedKey1: [2, 4, 6],
  nested.nestedKey2: [3, 5, 7],
}

and Elasticsearch won't know anymore that nestedKey1: 2 appeared together with nestedKey2: 3 in the same object. If you are insetad splitting up the document into three documents, you won't have that problem.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.