Filebeat: failed to publish events: temporary bulk send failure

Hi,

I have setup a few hosts at home & and am slowly teaching myself elastic.

Problem at the moment is filebeat 7.10.2 are having trouble sending to my elasticsearch instance. Below is the snippet I'm seeing from one of the hosts that is failing:

2021-02-07T14:28:28.426+1100 INFO [publisher] pipeline/retry.go:219 retryer: send unwait signal to consumer
2021-02-07T14:28:28.426+1100 INFO [publisher_pipeline_output] pipeline/output.go:143 Connecting to backoff(elasticsearch(https://192.168.10.6:9200))
2021-02-07T14:28:28.426+1100 INFO [publisher] pipeline/retry.go:223 done
2021-02-07T14:28:28.427+1100 INFO [esclientleg] eslegclient/connection.go:314 Attempting to connect to Elasticsearch version 7.10.2
2021-02-07T14:28:28.464+1100 INFO [license] licenser/es_callback.go:51 Elasticsearch license: Basic
2021-02-07T14:28:28.464+1100 INFO [index-management] idxmgmt/std.go:261 Auto ILM enable success.
2021-02-07T14:28:28.465+1100 INFO [index-management.ilm] ilm/std.go:139 do not generate ilm policy: exists=true, overwrite=false
2021-02-07T14:28:28.465+1100 INFO [index-management] idxmgmt/std.go:274 ILM policy successfully loaded.
2021-02-07T14:28:28.465+1100 INFO [index-management] idxmgmt/std.go:407 Set setup.template.name to '{filebeat-7.10.2 {now/d}-000001}' as ILM is enabled.
2021-02-07T14:28:28.465+1100 INFO [index-management] idxmgmt/std.go:412 Set setup.template.pattern to 'filebeat-7.10.2-*' as ILM is enabled.
2021-02-07T14:28:28.465+1100 INFO [index-management] idxmgmt/std.go:446 Set settings.index.lifecycle.rollover_alias in template to {filebeat-7.10.2 {now/d}-000001} as ILM is enabled.
2021-02-07T14:28:28.465+1100 INFO [index-management] idxmgmt/std.go:450 Set settings.index.lifecycle.name in template to {filebeat {"policy":{"phases":{"hot":{"actions":{"rollover":{"max_age":"30d","max_size":"50gb"}}}}}}} as ILM is enabled.
2021-02-07T14:28:28.465+1100 INFO template/load.go:97 Template filebeat-7.10.2 already exists and will not be overwritten.
2021-02-07T14:28:28.466+1100 INFO [index-management] idxmgmt/std.go:298 Loaded index template.
2021-02-07T14:28:28.466+1100 INFO [index-management] idxmgmt/std.go:309 Write alias successfully generated.
2021-02-07T14:28:28.473+1100 INFO [publisher_pipeline_output] pipeline/output.go:151 Connection to backoff(elasticsearch(https://192.168.10.6:9200)) established
2021-02-07T14:28:28.475+1100 INFO [publisher] pipeline/retry.go:219 retryer: send unwait signal to consumer
2021-02-07T14:28:28.475+1100 INFO [publisher] pipeline/retry.go:223 done

Below is what I'm seeing from a similar host that is working fine.

2021-02-07T14:14:39.522+1100 INFO [esclientleg] eslegclient/connection.go:314 Attempting to connect to Elasticsearch version 7.10.2
2021-02-07T14:14:39.562+1100 INFO [license] licenser/es_callback.go:51 Elasticsearch license: Basic
2021-02-07T14:14:39.564+1100 INFO [esclientleg] eslegclient/connection.go:314 Attempting to connect to Elasticsearch version 7.10.2
2021-02-07T14:14:39.616+1100 INFO [index-management] idxmgmt/std.go:261 Auto ILM enable success.
2021-02-07T14:14:39.617+1100 INFO [index-management.ilm] ilm/std.go:139 do not generate ilm policy: exists=true, overwrite=false
2021-02-07T14:14:39.617+1100 INFO [index-management] idxmgmt/std.go:274 ILM policy successfully loaded.
2021-02-07T14:14:39.618+1100 INFO [index-management] idxmgmt/std.go:407 Set setup.template.name to '{filebeat-7.10.2 {now/d}-000001}' as ILM is enabled.
2021-02-07T14:14:39.618+1100 INFO [index-management] idxmgmt/std.go:412 Set setup.template.pattern to 'filebeat-7.10.2-*' as ILM is enabled.
2021-02-07T14:14:39.618+1100 INFO [index-management] idxmgmt/std.go:446 Set settings.index.lifecycle.rollover_alias in template to {filebeat-7.10.2 {now/d}-000001} as ILM is enabled.
2021-02-07T14:14:39.618+1100 INFO [index-management] idxmgmt/std.go:450 Set settings.index.lifecycle.name in template to {filebeat {"policy":{"phases":{"hot":{"actions":{"rollover":{"max_age":"30d","max_size":"50gb"}}}}}}} as ILM is enabled.
2021-02-07T14:14:39.620+1100 INFO template/load.go:97 Template filebeat-7.10.2 already exists and will not be overwritten.
2021-02-07T14:14:39.620+1100 INFO [index-management] idxmgmt/std.go:298 Loaded index template.
2021-02-07T14:14:39.622+1100 INFO [index-management] idxmgmt/std.go:309 Write alias successfully generated.
2021-02-07T14:14:39.638+1100 INFO [publisher_pipeline_output] pipeline/output.go:151 Connection to backoff(elasticsearch(https://192.168.10.6:9200)) established

Both hosts are running filebeat 7.10.2 & Ubuntu 20.04.2. I have filebeat configured to send logs directly to elasticsearch (configuration between the hosts match). Checking in kibana shows the hosts with issues are sending 'some' logs through but not as many as I'd expect.

The only difference I've found so far is the hosts with issues are on the same subnet as the elasticsearch / kibana instance. The hosts without issues are all ones on different subnets.

Any suggestions on what to look at next? I've tried turning on filebeat debug logging but didn't find much of use.

Hi!

I don't see anything related to any errors in the logs. Most probably it should be a sth networking related. When you run in debug mode do you see any retries in connections to ES? Also can you compare the rate of the events that are sent from the 2 different nodes (healthy and the one with issues)?

C.

Hi,

I have dug a bit further and feel I'm a bit closer to the answer. After leaving debug mode enabled for a while I'm now seeing messages like this in the logs:

2021-02-10T20:57:51.928+1100 DEBUG [elasticsearch] elasticsearch/client.go:414 Bulk item insert failed (i=0, status=500): {"type":"illegal_state_exception","reason":"pipeline with id [filebeat-7.10.2-zeek-x509-pipeline] could not be loaded, caused by [ElasticsearchParseException[Error updating pipeline with id [filebeat-7.10.2-zeek-x509-pipeline]]; nested: GeneralScriptException[Failed to compile inline script [{{zeek.x509.certificate.version}}] using lang [mustache]]; nested: CircuitBreakingException[[script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.context.template.max_compilations_rate] setting];; GeneralScriptException[Failed to compile inline script [{{zeek.x509.certificate.version}}] using lang [mustache]]; nested: CircuitBreakingException[[script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.context.template.max_compilations_rate] setting];; org.elasticsearch.common.breaker.CircuitBreakingException: [script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.context.template.max_compilations_rate] setting]"}

Disabling modules associated with the following appears to have cleared the problem.

filebeat-7.10.2-zeek-x509-pipeline
filebeat-7.10.2-suricata-eve-pipeline
filebeat-7.10.2-aws-s3access-pipeline

Is adjusting the script.context.template.max_compilations_rate setting a safe thing to do? What is the best way of monitoring that?

cheers

Michael