How to list one records when there are 4 similar records with no unique field in ELK

Hi ,
we have an alerting mechanism in place with the help of watcher where we have a complex filter query with an attachment from kibana saved objects ,our index contains multiple records with very similar field values with no unique value in those records because the same record gets indexed from multiple source our requirement is to pick the latest records and share with the end users in the form of spread sheet .Our current implementation using the saved objects attachment duplicates are not getting removed however when we apply the same search query with the help of aggregation the results are coming as expected . Could you please help me with any other solution where I can exclude all other similar/duplicate records by just displaying one single record out of all the similar ones .
Note : we don't have Logstash in place the data is coming from solace queue
Any help would be greatly appreciated
Thanks !

Welcome to our community! :smiley:

We'd need to see a lot more detail to help here, examples of your queries and your data and the watches would be very helpful.

Thank you for responding

Data is coming from Solace queue and Id's are automatically generated by elastic and most of the fields are mapped as wildcard , date and few fields as text/keyword

PS : the query part of the watcher is working as expected(removing the similar records and displaying the latest record) but the results are getting fetched from the attachment which is coming from saved objects (we are not able to apply aggregations in the discover tab)
PUT _xpack/watcher/watch/log_error_watch
{
"trigger": {
"schedule": {
"interval": "10m"
}
},
"input": {
"search": {
"request": {
"indices": ["logs"],
"body": {
"query": {
"match": {
"message": "error"
//multiple match queries
}
},
"aggs":{
"dedup":{
"term":{
"field" :"example"},
"aggs":{
"latestdoc":{
"hits":{
"size":1,
"sort" :["@timestamp"]}}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 0
}
}
},
"actions": {
"log": {
"logging": {
"text": "We got the expected error"
"attachment"://we have an attachment csv url here from saved objects tab
}
}
}
}

There is no term aggregation type so I assume you mean terms.
There is no hits aggregation type so I assume you mean top_hits.

If you want to find the latest doc using top_hits it would need to sort in descending time order:

                  "aggs": {
                    "latestdoc": {
                      "top_hits": {
                        "size": 1,
                        "sort": [
                          {
                            "timestamp": {
                              "order": "desc"
                            }
                          }
                          ]
                      }
                    }
                  }

So, clearly, you already have a few mistakes in this that will prevent it from working as you expect.

Now, once you get the aggregations correct, you can then just use the payload of the resulting aggregations to just report out the information that you want. Below is an example using the built-in Kibana Sample Web Logs demo data set:

POST _watcher/watch/_execute
{
  "watch": {
    "trigger": {
      "schedule": {
        "interval": "10m"
      }
    },
    "input": {
      "search": {
        "request": {
          "indices": [
            "kibana_sample_data_logs"
            ],
            "body": {
              "size": 0,
              "query": {
                "bool": {
                  "filter": [
                    {
                      "term": {
                        "response.keyword": "503"
                      }
                    }
                    ]
                }
              },
              "aggs": {
                "dedup": {
                  "terms": {
                    "field": "geo.dest",
                    "size" :1000
                  },
                  "aggs": {
                    "latestdoc": {
                      "top_hits": {
                        "size": 1,
                        "sort": [
                          {
                            "timestamp": {
                              "order": "desc"
                            }
                          }
                          ]
                      }
                    }
                  }
                }
              }
            }
        }
      }
    },
    "condition": {
      "compare": {
        "ctx.payload.hits.total": {
          "gt": 0
        }
      }
    },
    "actions": {
      "log": {
        "transform": {
          "script": "return ctx.payload.aggregations.dedup.buckets.stream().map(p -> ['geo.dest':p.key,'time':p.latestdoc.hits.hits.0._source.timestamp]).collect(Collectors.toList());"
        },
        "logging": {
          "text": """
          We got the expected error for country codes
          {{#ctx.payload._value}}
          {{geo.dest}} at {{time}}
          {{/ctx.payload._value}}
          """
        }
      }
    }
  }
}

The output looks like:

          We got the expected error for country codes
          CN at 2023-02-22T08:20:29.957Z
          IN at 2023-02-22T15:39:31.289Z
          US at 2023-02-23T10:08:39.040Z
          ID at 2023-02-21T08:06:12.151Z
          BR at 2023-02-22T13:06:45.025Z
          ... 

... and so on