The Pipeline is Blocked, due to node_not_connected_exception with connection to Data Nodes

seth.yes · December 20, 2016, 5:55pm

After noticing an extreme slowdown in my pipeline (from 2500 events/sec to ~600 events/sec), I started logstash with verbosity and am seeing that the events I'm expecting to index are being dropped due to a node_not_connected_exception.

From a Logstash node;

:response=>{"create"=>{"_index"=>"filebeat-2016.12.19", "_type"=>"nginx-access", "_id"=>"AVkdVHPfvcAQ2OYuCYwA", "status"=>500, "error"=>{"type"=>"node_not_connected_exception", "reason"=>"[hyd-mon-storage02][10.191.4.144:9300] Node not connected"}}}, :level=>:warn}

However, the index has obviously already been created and Logstash is no longer even pointed at the data nodes -- only the two client nodes. Has anyone seen this issue before? None of these events are making it into Logstash anymore and there's nothing I can think of that's been changed recently to cause this.

OS: Ubuntu 14.04.3
LS: 2.2.4
ES: 2.4.1

seth.yes · December 20, 2016, 8:19pm

It doesn't appear many folks that have had the same issue are getting much assistance, here's what I've done in order to increase throughput so far (back up to 1800 events/sec from 600, goal is getting back to ~2500);

Redirected Logstash output away from the Data Nodes, strictly pointed at client nodes. I wouldn't have thought to direct LS data towards Data Nodes, but it was instructed elsewhere in the forums.
Disable Scatter-Gather on the network cards. As instructed by other users with the same issue, I disabled this functionality of the network cards.

Between these two changes, the indexing rate appears to be getting back up to the typical rate.

I hope this helps others, please let me know if I missed anything or if you have any additional suggestions.

warkolm · December 21, 2016, 11:30pm

Is there other things in Monitoring that might highlight the cause, GC on your nodes for eg?

To be fair, you only waited 3 hours for a response

seth.yes · December 22, 2016, 3:26pm

I was referring to searching the forums and seeing mostly single-post threads regarding this issue. I know you guys are busy and not slacking. I wanted to point out that I was updating my findings so the next person to search this issue might be able to gain some immediate insight.

@warkolm as you probably figured, the changes I referred to previously did not solve my issue in the long run. There is GC occasionally occurring on nodes and I'm moving to new hardware in the next few weeks, but looking for a way to maintain monitoring through then.

I decided to take this downtime to upgrade everything to 5.1. Will keep you posted once everything is sorted out, thanks for the response and please let me know if I can provide any additional info to help you analyze my issue.

system · January 19, 2017, 3:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Finding bottleneck in pipeline Logstash	9	1530	March 1, 2022
Logstash complaining about no ES nodes being available Logstash	7	3452	July 6, 2017
Logstash end up with cluster_block_exception and is frequently occuring Logstash	6	4638	April 26, 2019
Lumberjack input: the pipeline is blocked, temporary refusing new connection Logstash	8	3392	July 6, 2017
Beats input: the pipeline is blocked, temporary refusing new connection.",reconnect_backoff_sleep=>0.5 Beats filebeat	6	3238	July 5, 2017

The Pipeline is Blocked, due to node_not_connected_exception with connection to Data Nodes

Related topics