Hi @jsoriano,
Thanks for reaching out!
Our log profile is pretty consistent with the below entries being pretty common:
2019-04-23T14:20:42.127Z ERROR pipeline/output.go:121 Failed to publish events: temporary bulk send failure
2019-04-23T15:42:05.372Z ERROR pipeline/output.go:121 Failed to publish events: 500 Internal Server Error: {"took":7,"ignored":false,"errors":true,"error":{"type":"export_exception","reason":"Exception when closing export bulk","caused_by":{"type":"export_exception","reason":"failed to flush export bulks","caused_by":{"type":"export_exception","reason":"bulk [default_local] reports failures when exporting documents"
2019-04-23T15:42:14.273Z INFO pipeline/output.go:95 Connecting to backoff(publish(elasticsearch(https://<HOST>:443)))
^^ We have a pretty busy cluster and occasionally the bulk queue fills up and sending applications backoff & retry
|2019-04-23T15:46:17.217Z|ERROR|kubernetes/watcher.go:254|kubernetes: Watching API error EOF|
|---|---|---|---|
|2019-04-23T15:46:17.217Z|INFO|kubernetes/watcher.go:238|kubernetes: Watching API for resource events|
^^ Not entirely sure about these, but we are running on GKE so occasionally the master will be unavailable for resizing, upgrades etc.
Below is our metricbeat config:
metricbeat.config.modules:
# Mounted `metricbeat-daemonset-modules` configmap:
path: ${path.config}/modules.d/*.yml
# Reload module configs as they change:
reload.enabled: false
processors:
- add_cloud_metadata:
output.elasticsearch:
hosts: ["<REDACTED>:443"]
protocol: 'https'
username: '<REDACTED>'
password: "${ELASTICSEARCH_PASSWORD}"
index: "kubernetes-%{+yyyy.MM.dd}"
setup.template.enabled: false
xpack.monitoring.enabled: true
And the only file in modules.d
:
- module: kubernetes
metricsets:
- event
It's happening pretty frequently, I can fairly easily manually find examples of where it's happening at any given time -- I haven't been able to create a Kibana query though for identical documents (except timestamp), which would give me an exact figure
Cheers,
Mike