Return a set of values only on the first occurrence of one of them


#1

Hi all,

I use the latest ES version. My data set is a network flow (see an exemple below). Each time a new host starts a flow with one other, a new and unique flow_id is created. I would like to write an Elasticsearch request so I can grab for instance the dest_ip and the dest_port only at when the flow_id first appears in time. So my final answer would be for instance:
flow_id1 => dest_ip1 => dest_port1
flow_id2 => dest_ip2 => dest port 2
but not
flow_id1 => dest_ip1 => dest_port1
flow_id2 => dest_ip2 => dest_port2
flow_id1 => dest_ip1 => dest_port2
How could I manage this? I need this to be able to get the first set of values of the flow_id, and no the ones appearing later.

In SQL I would be doing SELECT dest_ip, dest_port, distinct flow_id FROM index ORDER BY timestamp.

For the moment I did this, but I'm not sure it's the best way. How can I return other fields (dest_ip, dest_port) with flow_id?

GET index-*/_search
{
  "size": 0,
  "aggs": {
    "2": {
      "terms": {
        "field": "flow_id",
        "size": 150,
        "order": {
          "_key": "asc"
        }
      },
      "aggs": {
        "1": {
          "top_hits": {
            "docvalue_fields": [
              "flow_id"
            ],
            "_source": "error",
            "size": 1,
            "sort": [
              {
                "timestamp": {
                  "order": "asc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

Index pattern example:

{
  "_index": "index-2018-11-17",
  "_type": "doc",
  "_id": "h3qeIGcBeUNSQc4lIwrI",
  "_version": 1,
  "_score": null,
  "_source": {
    "proto": "UDP",
    "@version": "1",
    "dest_ip": "192.168.0.15",
    "@timestamp": "2018-11-17T07:41:33.923Z",
    "dest_port": 49328,
    "in_iface": "wlp2s0",
    "timestamp": "2018-11-17T08:41:33.778296+0100",
    "flow_id": 1026861285856324,
    "event_type": "dns",
    "dns": {
      "type": "answer",
      "rrtype": "SOA",
      "id": 56358,
      "rcode": "NOERROR",
      "rrname": "elastic.co",
      "ttl": 10183
    },
    "host": "xxx",
    "src_ip": "XX.2.0.1",
    "src_port": 53
  },
  "fields": {
    "@timestamp": [
      "2018-11-17T07:41:33.923Z"
    ],
    "timestamp": [
      "2018-11-17T07:41:33.778Z"
    ]
  },
  "sort": [
    1542440493923
  ]
}

#2

Finally got it by:

GET index-*/_search
{
  "size": 0,
  "aggs": {
    "2": {
      "terms": {
        "field": "flow_id",
        "size": 150,
        "order": {
          "_key": "asc"
        }
      },
      "aggs": {
        "1": {
          "top_hits": {
            "docvalue_fields": [
              "flow_id","src_ip.keyword", "dest_ip.keyword" 
            ],
            "_source": "error",
            "size": 1,
            "sort": [
              {
                "timestamp": {
                  "order": "asc"
                }
              }
            ]
          }
        }
      }
    }
  }
}