Sum of document fields with same timestamp

irfancnk · January 3, 2020, 2:37pm

Hi, I have a bunch of network throughput document looking exactly like the following.

{timestamp: 1578038384.011689, ip1: 192.168.56.1, port1: 5353, ip2: 224.0.0.251, port2: 5353,  recv: 14.0, sent: 0.2, total: 14.2}

{timestamp: 1578038384.011689, ip1: 192.168.56.1, port1: 57621, ip2: 192.168.56.255, port2: 57621, recv: 2.4, sent: 10.0, total: 12.4}

{timestamp: 1578038384.013684, ip1: 192.168.0.106, port1: 57621, ip2: 172.217.23.110, port2: 443,  recv: 0.08, sent: 0.06, total: 0.15}

You can think that each JSON is a document lies in elastichsearch index. I want to show my user a kibana vega graph which visualize network throughput by entered filter. User can enter any of the ip1, port1, ip2, port2 (even multiple) fields as filter.

For example, if user enters only ip1 field as 192.168.56.1, I must take summation of first 2 documents and merge them in one timestamp. Hence, when I show the total field in vega graph these will be shown in one point in graph. If user enters port1 = 57621 as filter, then I must take only the last 2 documents but since they are different in timestamps I must show them as 2 point in my graph. Is there any better way than filtering all documents from elasticsearch summing them up in my server side, indexing into completely different shard and creating graph in kibana? Or is there an API for that?

William_Brafford · January 3, 2020, 5:16pm

Irfan,

The Elasticsearch query language can definitely express the question you are asking, but exactly how to do it depends on what your index's mappings are. Here's a query I came up with based on default mappings:

Create test index and index sample docs

PUT throughput
POST throughput/_doc
{
  "timestamp": 1578038384.011689,
  "ip1": "192.168.56.1",
  "port1": 5353,
  "ip2": "224.0.0.251",
  "port2": 5353,
  "recv": 14.0,
  "sent": 0.2,
  "total": 14.2
}

POST throughput/_doc
{
  "timestamp": 1578038384.011689,
  "ip1": "192.168.56.1",
  "port1": 57621,
  "ip2": "192.168.56.255",
  "port2": 57621,
  "recv": 2.4,
  "sent": 10.0,
  "total": 12.4
}

POST throughput/_doc
{
  "timestamp": 1578038384.013684,
  "ip1": "192.168.0.106",
  "port1": 57621,
  "ip2": "172.217.23.110",
  "port2": 443,
  "recv": 0.08,
  "sent": 0.06,
  "total": 0.15
}

GET throughput/_search
{
  "size": 0, 
  "query": {
    "match": {
      "ip1": "192.168.56.1"
    }
  },
  "aggs": {
    "time_buckets": {
      "histogram": {
        "field": "timestamp",
        "interval": 10
      },
      "aggs": {
        "recv_sum": {
          "sum": { "field": "recv" }
        },
        "sent_sum": {
          "sum": { "field": "sent" }
        },
        "total_sum": {
          "sum": { "field": "total" }
        }
      }
    }
  }
}

In that query, I'm "bucketing" documents by their timestamp and taking sums over those buckets, which reduces two data points to one. So, in theory, you shouldn't have to reindex data.

Unfortunately, I don't know very much about the Vega graph language. Perhaps if you have a working query, someone on the Kibana forum could help you work out the Kibana side of things?

-William

system · January 31, 2020, 5:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.