Hello @greg.melasecca
I think the following example should make it.
Demo data
PUT demodata/_doc/1
{
"@timestamp": "2020-05-12T09:10:40.828Z",
"field_a": "location_1"
}
PUT demodata/_doc/2
{
"@timestamp": "2020-05-12T09:10:41.828Z",
"field_a": "location_3"
}
PUT demodata/_doc/3
{
"@timestamp": "2020-05-12T09:10:42.828Z",
"field_a": "location_1"
}
PUT demodata/_doc/4
{
"@timestamp": "2020-05-12T09:10:43.828Z",
"field_a": "location_6"
}
PUT demodata/_doc/5
{
"@timestamp": "2020-05-12T09:10:44.828Z",
"field_a": "location_3"
}
PUT demodata/_doc/6
{
"@timestamp": "2020-05-12T09:10:45.828Z",
"field_a": "location_3"
}
Watch
You can use a terms
aggregation and then a bucket_filter
to get only terms with doc_count
> 1.
Is still worth mentioning the terms
aggregations are approximative and this can be heavy on global ordinals
(as the field_a
is a keyword
) if the cardinality of field_a
is really high (e.g. 10000+ or so).
You'll have to adjust the range
filter to consider just the last 5 minutes.
POST _watcher/watch/_execute
{
"watch": {
"trigger": {
"schedule": {
"interval": "30m"
}
},
"input": {
"search": {
"request": {
"indices": [
"demodata"
],
"body": {
"query": {
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"from": "now-1d",
"to": "now"
}
}
}
]
}
},
"size": 0,
"aggs": {
"field_a_groups": {
"terms": {
"field": "field_a",
"size": 100
},
"aggs": {
"filter_groups": {
"bucket_selector": {
"buckets_path": {
"count": "_count"
},
"script": "params.count > 1"
}
}
}
}
}
}
}
}
},
"condition": {
"script": {
"source": "ctx.payload.aggregations.field_a_groups.buckets.size() > 0"
}
},
"actions": {
"log": {
"logging": {
"text": "We have a the following values:\n{{#ctx.payload.aggregations.field_a_groups.buckets}}{{key}}({{doc_count}})\n{{/ctx.payload.aggregations.field_a_groups.buckets}}"
}
}
}
}
}
Maybe transforms?
Transform Jobs can be used to build an entity centric index.
POST _transform/_preview
{
"id": "demotransform",
"source": {
"index": [
"demodata"
],
"query": {
"match_all": {}
}
},
"dest": {
"index": "transformeddemodata"
},
"sync": {
"time": {
"field": "@timestamp",
"delay": "15m"
}
},
"pivot": {
"group_by": {
"time": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "5m"
}
},
"field_a_groups": {
"terms": {
"field": "field_a"
}
}
},
"aggregations": {
"count": {
"value_count": {
"field": "field_a"
}
}
}
}
}
Result:
{
"preview" : [
{
"field_a_groups" : "location_1",
"count" : 2.0,
"time" : 1589274600000
},
{
"field_a_groups" : "location_3",
"count" : 3.0,
"time" : 1589274600000
},
{
"field_a_groups" : "location_6",
"count" : 1.0,
"time" : 1589274600000
}
],
"mappings" : {
"properties" : {
"field_a_groups" : {
"type" : "keyword"
},
"count" : {
"type" : "long"
},
"time" : {
"type" : "date"
}
}
}
}
Based on that it would be possible to run a query based on such results.