Aggregating fields within document

agKaspar · May 12, 2021, 7:29am

Hi, We have a mechanism which handles a queue of database operations (INSERT, DELETE and UPDATE operations) for various tables, I am now working on creating a Kibana dashboard with some metrics about this queue, but am hitting a wall.

I've currently got an application which gets the queue size every 10 minutes and indexes this to elastic search as this type of document:

{
  "timeStamp": "2021-05-12T08:30:14.1324986+02:00",
  "tableA": {
    "insert": 2,
    "update": 3,
    "delete": 1
  },
  "tableB": {
    "insert": 0,
    "update": 2,
    "delete": 0
  }
}

I would like to start with the simplest dashboard item, which would be a total counter of all pending operations, for all tables, so with the above example, that would be 8.
From what I read, I think I would have to use a scripted metric, however I don't see a way in the docs to calculate all subfields by wildcard, is there any way to accomplish this?

Later on, I would also like a total metic for all operations per table. And finally, create a graph showing these numbers over time, based on the timeStamp field.

I would also be open to structuring this data in another way, it that would help with aggregating it.

spinscale · May 12, 2021, 7:53am

It might be easier to change the structure of your document a little and use a ingest pipeline like this

PUT test/_doc/1
{
  "timestamp": "2021-05-12T08:30:14.1324986+02:00",
  "tables": [
    {
      "name": "tableA",
      "insert": 2,
      "update": 3,
      "delete": 1
    },
    {
      "name": "tableB",
      "insert": 0,
      "update": 2,
      "delete": 0
    }
  ]
}

POST _ingest/pipeline/_simulate
{
  "docs": [
    {
      "_source": {
        "timestamp": "2021-05-12T08:30:14.1324986+02:00",
        "tables": [
          {
            "name": "tableA",
            "insert": 2,
            "update": 3,
            "delete": 1
          },
          {
            "name": "tableB",
            "insert": 0,
            "update": 2,
            "delete": 0
          }
        ]
      }
    }
  ],
  "pipeline": {
    "processors": [
      {
        "script": {
          "source": """
          ctx.total = 0;
          ctx.total += ctx.tables.stream().mapToInt(v -> v.insert+ v.update + v.delete).sum();

          """
        }
      }
    ]
  }
}

hope that helps as a start

dadoonet · May 12, 2021, 7:53am

Welcome!

I'd probably create documents like this:

POST /tables/_doc
{
  "@timestamp": "2021-05-12T08:30:14.1324986+02:00",
  "name": "tableA",
  "current": 6,
  "insert": 2,
  "update": 3,
  "delete": 1
}
POST /tables/_doc
{
  "@timestamp": "2021-05-12T08:30:14.1324986+02:00",
  "name": "tableB",
  "current": 2,
  "update": 2
}

Then I'd run a sum agg on current.

Also note that I changed timeStamp to @timestamp which kind of a standard field name.

agKaspar · May 12, 2021, 9:09am

Thanks for both your suggestions, I went with @dadoonet's suggestion, which does work fine for creating some charts and counters.
However, If I aggregate using sum, and enlarge my timespan (I'm indexing all pending operations, for all tables, every 10 minutes) to e.g. 24h, the aggregate function will start summing up different 'snapshots' , causing the numbers to appear twice as large.
Is there way to tell Kibana to only group by specific timestamp, or I could e.g. add a guid to each record, which would be the same for all records belonging to 1 snapshot, and then group by that guid.

dadoonet · May 12, 2021, 9:13am

I'm not sure I understood what the problem is but you can first aggregate with a date_histogram and then do a sum. It will sum per time period.

agKaspar · May 12, 2021, 9:48am

So I'm trying to get a graph over time in kibana, using the lens visualisation. This works, but when I set the time period too big (e.g. 24h), the graphs groups together the values per hour or so, causing totals of 2 'snapshots' to be summed up together, making the graph appear as if there were double the amount of operations pending.

This is the data shown over the last 2 hours, which is fine (although I'd like to get rid of the 0 values in between):

But this is the same data over the last 24h:

In the latter graph, it summed up to 'snapshots' of data, showing a total of 90k instead of 2x 45k

dadoonet · May 12, 2021, 10:00am

So you want to aggregate by name first using a terms agg?

system · June 9, 2021, 10:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kibana - Parse Out / Aggregate Total Values Scripted Code Kibana kql-kibana-query-language	9	630	November 4, 2022
Multiple Dimensions Analysis on aggregated dataset in Kibana Kibana	2	856	April 16, 2018
Kibana table combining aggregations Kibana	2	242	July 6, 2021
Sum of document fields with same timestamp Elasticsearch	2	1247	January 31, 2020
Aggregate multiple rows fields data in single data Kibana	7	4156	October 12, 2017

Aggregating fields within document

Related topics