Openshift Elastic Search Logging Spam (1b+ logs a month)

rmarcand · November 23, 2017, 12:18pm

The elasticshearch inside openshift origin is generating 25 million hits per day in the operations area (42 million overall), causing a high volume of logs between kibana and Splunk.

When did you start to notice this issue? Since implementing openshift
Whats version of the products you are using? OCP 3.5.5.31, Kibana 4.6.4, Splunk 6.6.1/ 7.0.0
Has anything changed since this issue started? We were originally on 3.4 of openshift when we first noticed the issue and have updated it in that time
What impact does this issue have on you, or your team, or department? Overloads our splunk instances and impacts on the delivery of functionality requested by the business

Figure out why we are getting so many logs
This is causing issues sending logs to Splunk from the Virginia Dev Cluster and also seems to be an unnecessary amount of logs generated.
Need to iron out why this is the case, potential solutions and fix.
Adding to this, it seems to be the .ops space not the containers within projects which is causing the issue.

Most common logs:

Nov 8 07:28:14 worker10 atomic-openshift-node: E1108 07:28:14.677607 11689 kubelet_volumes.go:114] Orphaned pod "785a8fd4-93aa-11e7-be9e-0eb2d31a2c34" found, but volume paths are still present on disk.
Nov 8 07:28:14 worker10 atomic-openshift-node: E1108 07:28:14.677680 11689 kubelet_volumes.go:114] Orphaned pod "7f0c51be-55cb-11e7-a6a1-1251d71ba0fc" found, but volume paths are still present on disk.
Nov 8 07:28:12 worker10 atomic-openshift-node: E1108 07:28:12.917191 11689 glusterfs.go:137] glusterfs: failed to get endpoints pvc-d713b6e1-2f3f-11e7-b014-0eb2d31a2c34[an empty namespace may not be set when a resource name is provided]

Nov 8 07:28:14 worker10 atomic-openshift-node: E1108 07:28:14.677760 11689 kubelet_volumes.go:114] Orphaned pod "856b9fc6-a445-11e7-8cfe-1251d71ba0fc" found, but volume paths are still present on disk.

Nov 5 03:48:04 worker01 dockerd-current: time="2017-11-05T03:48:04.426350586-05:00" level=error msg="Handler for GET /v1.24/containers/fe45c68ee909/json returned error: No such container: fe45c68ee909"

Nov 5 03:22:03 worker01 dockerd-current: time="2017-11-05T03:22:03.408069585-05:00" level=info msg="{Action=start, LoginUID=4294967295, PID=9541}"

#######
Logging
#######

Kibana log spamming with Fluentd, generates around 1 billion logs a month, 80%+ are in the .ops space not projects.

Openshift is on loglevel 2 as well:

[ec2-user@master01 log]$ ps -elf | grep '/usr/bin/openshift'
4 S root 1255 1 27 80 0 - 300930 futex_ Nov07 ? 08:21:39 /usr/bin/openshift start master api --config=/etc/origin/master/master-config.yaml --loglevel=2 --listen=https://0.0.0.0:8443 --master=https://master01.appcanvas.net:8443
4 S root 3752 1 3 80 0 - 235691 futex_ Nov07 ? 01:05:03 /usr/bin/openshift start master controllers --config=/etc/origin/master/master-config.yaml --loglevel=2 --listen=https://0.0.0.0:8444
4 S root 4658 1 6 80 0 - 178706 futex_ Nov07 ? 01:52:01 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2
0 S ec2-user 110951 110207 0 80 0 - 28166 pipe_w 06:59 pts/1 00:00:00 grep --color=auto /usr/bin/openshift
[ec2-user@master01 log]$

dadoonet · November 23, 2017, 2:08pm

I don’t understand.

What is the relationship between Kibana and Splunk?
Is your question related to elasticsearch?

May be someone else understand though?

Christian_Dahlqvist · November 23, 2017, 2:14pm

I do not see anything related to Kibana or Elasticsearch in these logs. Seems to me it could be an OpenShift issue?

rmarcand · November 23, 2017, 3:53pm

Hello dadoonet, thank you for your reply.

I will check with Openshift regarding bugs.

rmarcand · November 23, 2017, 3:53pm

Hello Christian

Thank you for your reply. I will check with Openshift guys

system · December 21, 2017, 3:54pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.