Upgraded from ES2.3.5 to 2.4.0, seeing: Transport response handler not found of id

teebu · September 4, 2016, 4:07am

I just upgraded to ES2.4.0 and I'm noticing these errors every time ES starts. What does this mean?

2016-09-04 04:03:38,655][WARN ][transport                ] [caste] Transport response handler not found of id [238]
[2016-09-04 04:03:38,764][WARN ][transport                ] [caste] Transport response handler not found of id [241]
[2016-09-04 04:03:39,567][WARN ][transport                ] [caste] Transport response handler not found of id [246]
[2016-09-04 04:03:39,815][WARN ][transport                ] [caste] Transport response handler not found of id [248]
[2016-09-04 04:03:42,886][WARN ][transport                ] [caste] Transport response handler not found of id [313]
[2016-09-04 04:03:44,960][WARN ][transport                ] [caste] Transport response handler not found of id [346]
[2016-09-04 04:03:44,994][WARN ][transport                ] [caste] Transport response handler not found of id [347]
[2016-09-04 04:03:45,881][WARN ][transport                ] [caste] Transport response handler not found of id [359]
[2016-09-04 04:03:46,007][WARN ][transport                ] [caste] Transport response handler not found of id [360]
[2016-09-04 04:03:49,791][WARN ][transport                ] [caste] Transport response handler not found of id [388]

tinle · September 4, 2016, 4:37am

Make sure all nodes are same version, 2.4. Do you have beats feeding into your cluster? Have you upgraded logstash to 2.4?

Tin

teebu · September 4, 2016, 4:50am

all nodes are upgraded from 2.3.5 to 2.4, no beats. LS 1.5 using http protocol. I never seen this message before.

tinle · September 4, 2016, 7:03pm

The only time I've seen that error message is when there is a mismatch in version. Maybe upgrade your LS?

Tin

teebu · September 6, 2016, 7:21pm

Thanks, unfortunately I cant upgrade LS, the plugins that I use don't work well with latest LS. I restarted the ES cluster with LS off, and I was still seeing these messages.

tinle · September 7, 2016, 2:48am

Yes, look like it might be something else then... Maybe Elastic people can chime in.

Tin

Ids_van_der_Molen · September 9, 2016, 6:46am

Hi, I also noticed the same errors when using ES 2.4.0 and logstash 2.4.0.

ironbeast · September 14, 2016, 8:01am

Yep, me too... It's just spitting log lines one after the other filling the whole disk...
And sometimes this comes up: (elasticsearch03dev-es-cluster-dev is another node)
[elasticsearch03dev-es-cluster-dev][10.206.13.216:9300][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[elasticsearch03dev-es-cluster-dev][10.206.13.216:9300][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@6807b591 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5e53305a[Running, pool size = 4, active threads = 4, queued tasks = 4138, completed tasks = 29660586]]];]];
[0]: index [.marvel-es-1-2016.09.14], type [node_stats], id [AVcnnsDhtV0Qddn6Zthd], message [RemoteTransportException[[elasticsearch03dev-es-cluster-dev][10.206.13.216:9300][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[elasticsearch03dev-es-cluster-dev][10.206.13.216:9300][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@6807b591 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5e53305a[Running, pool size = 4, active threads = 4, queued tasks = 4138, completed tasks = 29660586]]];]]
[0]: index [.marvel-es-1-2016.09.14], type [node_stats], id [AVcnnum-tV0Qddn6ZudA], message [RemoteTransportException[[elasticsearch03dev-es-cluster-dev][10.206.13.216:9300][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[elasticsearch03dev-es-cluster-dev][10.206.13.216:9300][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@6131d78d on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5e53305a[Running, pool size = 4, active threads = 4, queued tasks = 4005, completed tasks = 29662634]]];]];

saeppl · September 22, 2016, 10:05am

Same problem hier. We are running 4 data-nodes - all on es 2.4.0. I just upgraded all logstah-loggers to 2.4.0 but am still getting those transport errors.
It might be helpful to know that we are running 2 servers with 2 nodes each.

Kim-Kruse-Hansen · September 22, 2016, 2:36pm

Same problem for me . 2 separate clusters each with 2 nodes , all at 2.4.0
Happens frequently

kobyb · September 22, 2016, 2:59pm

+1 (after upgrading to 2.4)

tinle · September 22, 2016, 3:57pm

I have 2.4.0 running on a test cluster of 3 nodes and have not had time to look at it recently. I just checked this morning, and although the number of these WARNings has gone down, I do see them in 2 of the data nodes. I no longer see them in my dedicated master node.

From the look of it, these warnings happen when the nodes are memory stressed and/or experiencing high CPU load. Look like Elastic added these warnings in 2.4.

Check your ES logs and see if you also see GC, high load and/or NodeDisconnectedException around same time frame.

ruwilliams · September 22, 2016, 6:39pm

I also see this message not only during times of heavy load, but when system is rather quiet. I am running on GCE cloud. I wonder if this relates to network interruptions perhaps. Would be nice to know what this means.

milleka2 · September 23, 2016, 2:09am

I'm seeing the same problem, so hoping elastic will respond. (edit: I found a solution, which is in the bottom of my post)

All of my nodes are 2.4.0. I just upgraded them all from 2.1.1 with 2.4.0. In my case, I have a master and n data nodes that are all behaving fine; however, I have a couple client-only (non-data, non-master) nodes that are showing this error after the upgrade.

I tried increasing memory allocations on those boxes, but that had no effect. There is no running logstash or kibana. The cluster is closed off for the moment, so there are no clients. I wondered if it was a networking issue, but I can reach the master from the client nodes (curl to the ES API works). They're on the same subnet, and I disabled iptables on the clients just in case. Nothing has helped. I'll try updating my log level next.

I'll add that this is preventing the client nodes from connecting to the master, so they're pretty useless.

edit: I noticed in logs later that I was seeing the below: failed to send join request to master [{Poltergeist}{eBP_rUJCTDimhgLUTiduxg}{192.168.1.4}{192.168.1.4:9300}{data=false, master=true}], reason [RemoteTransportException[[Poltergeist][192.168.1.4:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[Star-Lord][10.x.x.x:9300] connect_timeout[30s]]; nested: NotSerializableExceptionWrapper[connect_timeout_exception: connection timed out: /10.x.x.x:9300];

In this case, I saw an IP on a separate interface was being used to connect to the master. I temporarily turned that interface off and I was able to start my client without issues. Afterward, I reenabled that interface. Is this going to be ok or will the problem recur now that I've enabled the iface? For the time being (first few minutes after restart), it's ok. I suspect there's a config somewhere I should be able to set to get around this, as I can't keep shutting down our interfaces. How can I inform ES 2.4.0 to only use one interface for talking to master (but a separate interface for serving client requests)?

edit2 (solution for me): the config I needed to set was network.publish_host. What's strange is I didn't need to put this in my prior configs. I've always used the default network settings in prior ES versions.

seth.yes · September 26, 2016, 5:25pm

+1 for this issue

followed @milleka2's advice to set network.publish_host on each ES node and I'm still receiving the error.

tinle · September 26, 2016, 6:04pm

Someone reported seeing this warning when they changed their network MTU.

So this seem to be related to losing packets.

seth.yes · September 27, 2016, 4:03pm

**

EDIT: The fix mentioned below did not stop this error from occurring.

**

I think that lost packets could cause this, however there have been no changes in MTU settings at my site.

My understanding (and someone please correct me if I'm wrong) is that these errors are regarding events being lost at the transport layer for whatever reason... which would typically be network issues.

I'm unsure of everyone else's setup, but what seems to have been causing this for me was a misconfiguration on the client node in my cluster. I have two master-eligible nodes, two data-only nodes and one client node (no data, non-master, meant for load balancing etc..).

The problem appears to have been that I had in my elasticsearch.yml:
node.max_local_storage_nodes: 1 <!code>

which doesn't make sense for a node that's not allowed to perform local data storage. When the client node would enter the cluster, all devices would be updated of the client node properties, e.g.
[2016-09-26 13:06:25,386][INFO ][cluster.service ] [hyd-mon-storage01] added {{load-balance-node}{Pp9lrLb2S-W9VDY4d_zsKg}{10.191.4.126}{10.191.4.126:9300}{max_local_storage_nodes=1, data=false, master=false},}, reason: zen-disco-receive(from master [{phys-node}{PFjAWYe9T_W_VsmKm-hcFQ}{10.191.5.129}{10.191.5.129:9300}{max_local_storage_nodes=1, master=true}]) <!code>

So they'd see it has storage nodes, the master would attempt to write to this device but it doesn't accept data, thus the event gets lost. Furthermore, I was seeing my primary index randomly get deleted and changing:

node.max_local_storage_nodes: 0<!code>

appears to have resolved this issue for me.

If you're still seeing this error in your cluster, I'd attempt to recreate the path an event takes from logstash into your cluster, all the way through to a data node.. there's likely some misconfig causing the errors. I'll keep updating this thread with any further results and conclusions I reach.

Andrew_Stoker · September 27, 2016, 6:34pm

I have the same issue. Quick search of upcoming 2.4.1 release may have the fix https://github.com/elastic/elasticsearch/pull/20585

Andrew_Stoker · September 29, 2016, 10:15am

I can confirm that with the 2.4.1 release this error has been resolved.

seth.yes · September 29, 2016, 3:55pm

Also confirmed that upgrading to 2.4.1 resolves this issue.

Topic		Replies	Views
Transport response handler not found of id Elasticsearch	4	256	April 27, 2023
Transport response handler not found of Elasticsearch	7	8352	July 5, 2017
Transport response handler not found for id and shared are becoming unassigned Elasticsearch	1	923	January 12, 2017
Transport response handler not found of id Elasticsearch	4	1206	May 22, 2018
Transport response handler not found of id [1] Elasticsearch	3	919	March 30, 2021

Upgraded from ES2.3.5 to 2.4.0, seeing: Transport response handler not found of id

EDIT: The fix mentioned below did not stop this error from occurring.

Related topics