Filebeat is not sending logs to ELastic search

Hello Team,

We have an issue where filebeat is not sending some logs to elasticsearch so we can see it on kibana.
The error we have received for the same is as below.

Entity Too Large\u003c/title\u003e\u003c/head\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:29.343311185Z"}
{"log":"\u003cbody\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:29.343317172Z"}
{"log":"\u003ccenter\u003e\u003ch1\u003e413 Request Entity Too Large\u003c/h1\u003e\u003c/center\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:29.343319241Z"}
{"log":"\u003chr\u003e\u003ccenter\u003enginx\u003c/center\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:29.343321505Z"}
{"log":"\u003c/body\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:29.343323615Z"}
{"log":"\u003c/html\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:29.343325355Z"}
{"log":"\n","stream":"stderr","time":"2020-09-03T17:50:29.343327201Z"}
{"log":"2020-09-03T17:50:29.340Z\u0009INFO\u0009pipeline/output.go:95\u0009Connecting to backoff(elasticsearch(https://es-eks.euw1.rpe-internal.com:443))\n","stream":"stderr"
,"time":"2020-09-03T17:50:29.343328987Z"}
{"log":"2020-09-03T17:50:29.340Z\u0009INFO\u0009[publisher]\u0009pipeline/retry.go:196\u0009retryer: send unwait-signal to consumer\n","stream":"stderr","time":"2020-09-03T17:50:
29.343331418Z"}
{"log":"2020-09-03T17:50:29.340Z\u0009INFO\u0009[publisher]\u0009pipeline/retry.go:198\u0009  done\n","stream":"stderr","time":"2020-09-03T17:50:29.343333578Z"}
{"log":"2020-09-03T17:50:29.340Z\u0009INFO\u0009[publisher]\u0009pipeline/retry.go:173\u0009retryer: send wait signal to consumer\n","stream":"stderr","time":"2020-09-03T17:50:29
.343335698Z"}
{"log":"2020-09-03T17:50:29.340Z\u0009INFO\u0009[publisher]\u0009pipeline/retry.go:175\u0009  done\n","stream":"stderr","time":"2020-09-03T17:50:29.343340914Z"}
{"log":"2020-09-03T17:50:29.352Z\u0009INFO\u0009elasticsearch/client.go:757\u0009Attempting to connect to Elasticsearch version 7.6.1\n","stream":"stderr","time":"2020-09-03T17:5
0:29.35290055Z"}
{"log":"2020-09-03T17:50:29.376Z\u0009INFO\u0009template/load.go:89\u0009Template filebeat already exists and will not be overwritten.\n","stream":"stderr","time":"2020-09-03T17:
50:29.377012369Z"}
{"log":"2020-09-03T17:50:29.376Z\u0009INFO\u0009[index-management]\u0009idxmgmt/std.go:295\u0009Loaded index template.\n","stream":"stderr","time":"2020-09-03T17:50:29.377041608Z
"}
{"log":"2020-09-03T17:50:29.379Z\u0009INFO\u0009pipeline/output.go:105\u0009Connection to backoff(elasticsearch(https://es-eks.euw1.rpe-internal.com:443)) established\n","str
eam":"stderr","time":"2020-09-03T17:50:29.379411308Z"}
{"log":"2020-09-03T17:50:29.379Z\u0009INFO\u0009[publisher]\u0009pipeline/retry.go:196\u0009retryer: send unwait-signal to consumer\n","stream":"stderr","time":"2020-09-03T17:50:
29.3794334Z"}
{"log":"2020-09-03T17:50:29.379Z\u0009INFO\u0009[publisher]\u0009pipeline/retry.go:198\u0009  done\n","stream":"stderr","time":"2020-09-03T17:50:29.379459664Z"}
{"log":"2020-09-03T17:50:29.388Z\u0009ERROR\u0009elasticsearch/client.go:350\u0009Failed to perform any bulk index operations: 413 Request Entity Too Large: \u003chtml\u003e\r\n"
,"stream":"stderr","time":"2020-09-03T17:50:29.388429314Z"}
{"log":"\u003chead\u003e\u003ctitle\u003e413 Request Entity Too Large\u003c/title\u003e\u003c/head\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:29.388459044Z"}
{"log":"\u003cbody\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:29.388467691Z"}
{"log":"\u003ccenter\u003e\u003ch1\u003e413 Request Entity Too Large\u003c/h1\u003e\u003c/center\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:29.38847321Z"}
{"log":"\u003chr\u003e\u003ccenter\u003enginx\u003c/center\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:29.388479637Z"}
{"log":"\u003c/body\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:29.388485363Z"}
{"log":"\u003c/html\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:29.388490247Z"}
{"log":"\n","stream":"stderr","time":"2020-09-03T17:50:29.388495211Z"}
{"log":"2020-09-03T17:50:29.388Z\u0009INFO\u0009[publisher]\u0009pipeline/retry.go:173\u0009retryer: send wait signal to consumer\n","stream":"stderr","time":"2020-09-03T17:50:29
.388513587Z"}
{"log":"2020-09-03T17:50:29.388Z\u0009INFO\u0009[publisher]\u0009pipeline/retry.go:175\u0009  done\n","stream":"stderr","time":"2020-09-03T17:50:29.388520807Z"}
{"log":"2020-09-03T17:50:31.285Z\u0009ERROR\u0009pipeline/output.go:121\u0009Failed to publish events: 413 Request Entity Too Large: \u003chtml\u003e\r\n","stream":"stderr","time
":"2020-09-03T17:50:31.285786413Z"}
{"log":"\u003chead\u003e\u003ctitle\u003e413 Request Entity Too Large\u003c/title\u003e\u003c/head\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:31.285822628Z"}
{"log":"\u003cbody\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:31.285829076Z"}
{"log":"\u003ccenter\u003e\u003ch1\u003e413 Request Entity Too Large\u003c/h1\u003e\u003c/center\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:31.285832294Z"}
{"log":"\u003chr\u003e\u003ccenter\u003enginx\u003c/center\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:31.285837107Z"}
{"log":"\u003c/body\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:31.285840497Z"}
{"log":"\u003c/html\u003e\r\n","stream":"stderr","time":"2020-09-03T17:50:31.285842537Z"

I have searched the issue over the internet and found some blogs where they have advised to reduce the bulk_max_size value in filebeat.yaml file.
I tried to replicate the issue with a pod which sends the data over 200MB to elastic search and i am receiving logs for that pod.

Please suggest

What sort of logs are they? Is it possible that you have an exceptionally large entry in amongst normal sized entries?

These are applicational some spark job logs, i can surely understand about large enteries. but what we can do to get these logs.
Any solutions?

Can you show us the configuration file for your filebeat?

We have installed filebeat service on application server where log file grows about 700-800MB per hour and our default configuration works well. There are no change for batch size or workers.

Even I tried to replicate it with massive logs and by running a container and parsed around 1 gb log file.
It worked well on my personal cluster but not sure why it's giving us the error for
"Entity too large" in corporate one

There is nothing special in your configuration file.

Have you checked the system utilization on this problematic environment? Does the container has some system limitation?

Maybe you should try to adjust filebeat by these parameters:

output.elasticsearch:
  bulk_max_size: xxxx
  worker: x

ok thanks for the help!

Coworker of Raman, following up with our solution to the problem for the benefit of future generations.

The issue ended up being Kubernetes related. Our elasticsearch stack is running in Kubernetes, and we were using some fairly bad defaults values for our nginx ingress controller configurations.

The indicator ended up being fairly obvious, the filebeat log even prints out the error message complete with a mention of nginx in its error.

Due to the default settings of the nginx ingress controller, nginx itself would reject any payloads larger than 10kb. Also, the ingress controllers would frequently die due to only having 64 MB of memory available.

Elasticsearch itself could handle the bulk index requests with no issue, but requests never made it there because nginx would reject them before it could get that far.

The fix was to reconfigure the nginx ingress to accept larger payloads than the default.

I'm not the best at k8s patching, but I'll paste the commands I ran to fix my k8s cluster, which then resolved the issue from the filebeat side and got our logs ingested.

kubectl -n ingress-nginx get configmap ingress-internal-nginx-ingress-controller -o yaml | sed '/^data:.*/a \ \ proxy-body-size: \"10m\"' > /tmp/tempConfigMap; kubectl apply -f /tmp/tempConfigMap; rm /tmp/tempConfigMap

This pipes the nginx ingress' configmap, adds the proxy-body-size configuration value (set to 10m), writes that to a temp file, applies that temp file, then deletes the temp file.

kubectl -n ingress-nginx patch deployment ingress-internal-nginx-ingress-controller --patch "{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"nginx-ingress-controller\",\"resources\":{\"limits\":{\"cpu\":\"2\",\"memory\":\"512Mi\"},\"requests\":{\"cpu\":\"1\",\"memory\":\"256Mi\"}}}]}}}}"

This patches the nginx ingress deployment to include some sane default values for resource requests and limits. This is very off-the-cuff, and not at all based on specific usage patterns - your cluster may differ from mine, and your nginx pods might need more memory/cpu (or less!).

Added bonus, patching the nginx ingress deployment restarts all of the nginx ingress pods, which also pulls up the new configmap.

1 Like