Hi all,
I use the latest ES version. My data set is a network flow (see an exemple below). Each time a new host starts a flow with one other, a new and unique flow_id is created. I would like to write an Elasticsearch request so I can grab for instance the dest_ip and the dest_port only at when the flow_id first appears in time. So my final answer would be for instance:
flow_id1 => dest_ip1 => dest_port1
flow_id2 => dest_ip2 => dest port 2
but not
flow_id1 => dest_ip1 => dest_port1
flow_id2 => dest_ip2 => dest_port2
flow_id1 => dest_ip1 => dest_port2
How could I manage this? I need this to be able to get the first set of values of the flow_id, and no the ones appearing later.
In SQL I would be doing SELECT dest_ip, dest_port, distinct flow_id FROM index ORDER BY timestamp.
For the moment I did this, but I'm not sure it's the best way. How can I return other fields (dest_ip, dest_port) with flow_id?
GET index-*/_search
{
"size": 0,
"aggs": {
"2": {
"terms": {
"field": "flow_id",
"size": 150,
"order": {
"_key": "asc"
}
},
"aggs": {
"1": {
"top_hits": {
"docvalue_fields": [
"flow_id"
],
"_source": "error",
"size": 1,
"sort": [
{
"timestamp": {
"order": "asc"
}
}
]
}
}
}
}
}
}
Index pattern example:
{
"_index": "index-2018-11-17",
"_type": "doc",
"_id": "h3qeIGcBeUNSQc4lIwrI",
"_version": 1,
"_score": null,
"_source": {
"proto": "UDP",
"@version": "1",
"dest_ip": "192.168.0.15",
"@timestamp": "2018-11-17T07:41:33.923Z",
"dest_port": 49328,
"in_iface": "wlp2s0",
"timestamp": "2018-11-17T08:41:33.778296+0100",
"flow_id": 1026861285856324,
"event_type": "dns",
"dns": {
"type": "answer",
"rrtype": "SOA",
"id": 56358,
"rcode": "NOERROR",
"rrname": "elastic.co",
"ttl": 10183
},
"host": "xxx",
"src_ip": "XX.2.0.1",
"src_port": 53
},
"fields": {
"@timestamp": [
"2018-11-17T07:41:33.923Z"
],
"timestamp": [
"2018-11-17T07:41:33.778Z"
]
},
"sort": [
1542440493923
]
}