Duplicated Kubernetes Events

Hi,

We're using the Kubernetes event metricset to ingest Kubernetes events into Elasticsearch, however, we've observed occurrences where the exact same event is being emitted to Elasticsearch multiple times, which can cause confusion when looking through events.

I understand Kubernetes does aggregate events so I could imagine the same event appearing multiple times if the count of that event is increasing, however in these instances it's the exact same vent:

52

In the example above you can see the same event was emitted about 30 minutes apart. I checked and Metricbeat had been running for a number of days (it hadn't restarted for example)

My question is if there's any sort of registry similar to Filebeat it can use to track which events have been processed?

Cheers

Hi @Evesy,

This is actually unexpected. This could be caused by some retry, but it is strange to happen after half an hour, this could be a bug.

Did you see any error in metricbeat logs during this time?

What is the output configured in metricbeat?

How often does it happen?

Hi @jsoriano,

Thanks for reaching out!

Our log profile is pretty consistent with the below entries being pretty common:

2019-04-23T14:20:42.127Z	ERROR	pipeline/output.go:121	Failed to publish events: temporary bulk send failure
2019-04-23T15:42:05.372Z	ERROR	pipeline/output.go:121	Failed to publish events: 500 Internal Server Error: {"took":7,"ignored":false,"errors":true,"error":{"type":"export_exception","reason":"Exception when closing export bulk","caused_by":{"type":"export_exception","reason":"failed to flush export bulks","caused_by":{"type":"export_exception","reason":"bulk [default_local] reports failures when exporting documents"

2019-04-23T15:42:14.273Z	INFO	pipeline/output.go:95	Connecting to backoff(publish(elasticsearch(https://<HOST>:443)))

^^ We have a pretty busy cluster and occasionally the bulk queue fills up and sending applications backoff & retry

|2019-04-23T15:46:17.217Z|ERROR|kubernetes/watcher.go:254|kubernetes: Watching API error EOF|
|---|---|---|---|
|2019-04-23T15:46:17.217Z|INFO|kubernetes/watcher.go:238|kubernetes: Watching API for resource events|

^^ Not entirely sure about these, but we are running on GKE so occasionally the master will be unavailable for resizing, upgrades etc.

Below is our metricbeat config:

metricbeat.config.modules:
  # Mounted `metricbeat-daemonset-modules` configmap:
  path: ${path.config}/modules.d/*.yml
  # Reload module configs as they change:
  reload.enabled: false

processors:
  - add_cloud_metadata:

output.elasticsearch:
  hosts: ["<REDACTED>:443"]
  protocol: 'https'
  username: '<REDACTED>'
  password: "${ELASTICSEARCH_PASSWORD}"
  index: "kubernetes-%{+yyyy.MM.dd}"

setup.template.enabled: false

xpack.monitoring.enabled: true

And the only file in modules.d:

- module: kubernetes
  metricsets:
    - event

It's happening pretty frequently, I can fairly easily manually find examples of where it's happening at any given time -- I haven't been able to create a Kibana query though for identical documents (except timestamp), which would give me an exact figure

Cheers,
Mike

We are investigating an issue on reconnections when watching for Kubernetes events, this could cause your duplicated events. I have opened an issue to keep track of this: https://github.com/elastic/beats/issues/11917

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.