How to aggregate & visualize data from specific fields in index elements? (mobile game)

Hi there,

There is an index, each element of which contains:

  1. general information about a player's session in tutorial
  2. a few arrays with information for each tutorial stage

It looks like this:
> id:206 duration:35 created_at:Oct 23, 2019 @ 16:55:11.000 build:0.2.3.889 player_id:450 is_tester:true stages:{ "id": 1475, "stage_id": 0, "duration": 11, "completed": true }, { "id": 1476, "stage_id": 1, "duration": 6, "completed": true }, { "id": 1477, "stage_id": 2, "duration": 2, "completed": true }, { "id": 1478, "stage_id": 3, "duration": 2, "completed": true }, { "id": 1479, "stage_id": 4, "duration": 3, "completed": true }, { "id": 1480, "stage_id": 5, "duration": 3, "completed": true }, { "id": 1481, "stage_id": 6, "duration": 4, "completed": true } _id:Hdrq-G0BSOdvtuXl5PqJ _type:data _index:tutorial_data _score: -

(Or same in JSON)
{
  "_index": "tutorial_data",
  "_type": "data",
  "_id": "Sdoi-W0BSOdvtuXlEfvi",
  "_version": 1,
  "_score": null,
  "_source": {
    "id": 207,
    "duration": 28,
    "created_at": "2019-10-23T14:26:28.000+00:00",
    "build": "0.2.3.889",
    "player_id": 450,
    "is_tester": true,
    "stages": [
      {
        "id": 1482,
        "stage_id": 0,
        "duration": 4,
        "completed": true
      },
      {
        "id": 1483,
        "stage_id": 1,
        "duration": 5,
        "completed": true
      },
      {
        "id": 1484,
        "stage_id": 2,
        "duration": 2,
        "completed": true
      },
      {
        "id": 1485,
        "stage_id": 3,
        "duration": 7,
        "completed": true
      },
      {
        "id": 1486,
        "stage_id": 4,
        "duration": 3,
        "completed": true
      },
      {
        "id": 1487,
        "stage_id": 5,
        "duration": 6,
        "completed": true
      }
    ]
  },
  "fields": {
    "created_at": [
      "2019-10-23T14:26:28.000Z"
    ]
  },
  "sort": [
    1571840788000
  ]
}

The task is to create a table/ a diagram of an average time spent on each stage of the tutorial.
When I aggregate through the "stages.duration" field (not just the "duration" !) and split the table by "stages.stage_id", it calculates an average time basing on ALL stages in sessions that include a record about the given stages.stages_id (e.g., a stage №2).

This is my request
{
  "aggs": {
"3": {
  "terms": {
    "field": "stages.stage_id",
    "order": {
      "_key": "desc"
    },
    "size": 9
  },
  "aggs": {
    "1": {
      "avg": {
        "field": "stages.duration"
      }
    }
  }
}
  },
  "size": 0,
  "_source": {
"excludes": []
  },
  "stored_fields": [
"*"
  ],
  "script_fields": {},
  "docvalue_fields": [
{
  "field": "created_at",
  "format": "date_time"
}
  ],
  "query": {
"bool": {
  "must": [
    {
      "range": {
        "created_at": {
          "format": "strict_date_optional_time",
          "gte": "2019-10-11T20:30:00.000Z",
          "lte": "2019-10-24T08:10:54.837Z"
        }
      }
    }
  ],
  "filter": [
    {
      "match_all": {}
    }
  ],
  "should": [],
  "must_not": []
}
  }
}
Response
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 203,
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "3": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "1": {
            "value": 8.141304347826088
          },
          "key": 8,
          "doc_count": 147
        },
        {
          "1": {
            "value": 8.164634146341463
          },
          "key": 7,
          "doc_count": 150
        },
        {
          "1": {
            "value": 8.108289768483944
          },
          "key": 6,
          "doc_count": 154
        },
        {
          "1": {
            "value": 8.036390101892286
          },
          "key": 5,
          "doc_count": 160
        },
        {
          "1": {
            "value": 8.02030456852792
          },
          "key": 4,
          "doc_count": 161
        },
        {
          "1": {
            "value": 7.935256861365236
          },
          "key": 3,
          "doc_count": 172
        },
        {
          "1": {
            "value": 8.189151599443672
          },
          "key": 2,
          "doc_count": 179
        },
        {
          "1": {
            "value": 8.189151599443672
          },
          "key": 1,
          "doc_count": 179
        },
        {
          "1": {
            "value": 9.656871218668972
          },
          "key": 0,
          "doc_count": 160
        }
      ]
    }
  },
  "status": 200
}

So how can I "pick out" and aggregate through the data, e.g. just for the 2nd stage in all appropriate index elements?

Hi @Anne_Kim,

unfortunately this is currently not possible in Kibana because nested fields are not supported: https://github.com/elastic/kibana/issues/1084

For Kibana, stages.duration and stages.stage_id look like two separate arrays that don't have any relationship, so it can't associate one value from one array with a single value in the other array.

For your use case it makes sense to denormalize and ingest one document per individual stage - then you can create the diagram exactly like you described it.

Ok, thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.