Timeout on /_cat/indices on a single node

I have a cluster of about 10 data nodes, 5 master nodes and a few client nodes.

I've got a single client node that no matter what I do, I can get the cluster state via curl, but not getting any responses back from /_cat/indices on that node. I re-installed elasticsearch and cleared a bunch of directory. I tried the same install on a different server and worked just fine.

How do we debug these kind of issues? The only thing I can see in the logs for each masters is the following :

[2016-10-20 17:11:41,189][WARN ][transport ] [node0001-master] Transport response handler not found of id [956886]
[2016-10-20 17:11:41,281][WARN ][transport ] [node0001-master] Transport response handler not found of id [956919]
[2016-10-20 17:11:41,282][WARN ][transport ] [node0001-master] Transport response handler not found of id [956930]
[2016-10-20 17:11:41,283][WARN ][transport ] [node0001-master] Transport response handler not found of id [956933]
[2016-10-20 17:11:43,808][WARN ][transport ] [node0001-master] Transport response handler not found of id [957725]
[

Timeout out query :

[me@client ~]$ curl http://localhost:9200/_cat/indices -v

  • About to connect() to localhost port 9200 (#0)
  • Trying 127.0.0.1... connected
  • Connected to localhost (127.0.0.1) port 9200 (#0)

GET /_cat/indices HTTP/1.1
User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.21 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Host: localhost:9200
Accept: /

^C

Cluster Health :

[me@client ~]$ curl http://localhost:9200/_cluster/health?pretty=true
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 18,
"number_of_data_nodes" : 10,
"active_primary_shards" : 156,
"active_shards" : 312,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

Thanks,

What version?

2.4.0

Do you have some plugins?
Also try to upgrade to 2.4.1

I actually installed 2.4.1 first which had the issue. Then I decided to
downgraded to same version as the cluster (2.4.0). Between the installed I
cleared all data directories and temp directories. I just don't know why
the specific query fails all the time. There is other queries such as
/_cat/count/index that works half the time.

Do you have plugins ?

You made me thinking. But no, I only have kibana installed locally as well
which kibana has the sense plugin to be able to control via API.

For example with count... Works then doesn't...

[smalenfa@cdn1cdstats0001 installedPlugins]$ curl
http://localhost:9200/_cat/count/custom_ats_2-2016.10.16
1477162602 18:56:42 1049713714
[smalenfa@cdn1cdstats0001 installedPlugins]$ curl
http://localhost:9200/_cat/count/custom_ats_2-2016.10.18
^C
[smalenfa@cdn1cdstats0001 installedPlugins]$ curl
http://localhost:9200/_cat/count/custom_ats_2-2016.10.18
^C
[smalenfa@cdn1cdstats0001 installedPlugins]$ curl
http://localhost:9200/_cat/count/custom_ats_2-2016.10.18
^C
[smalenfa@cdn1cdstats0001 installedPlugins]$ curl
http://localhost:9200/_cat/count/custom_ats_2-2016.10.18
1477162607 18:56:47 779883228
[smalenfa@cdn1cdstats0001 instal

How do we debug ES (data) incoming queries or routing? I thought my cluster
was too busy, but then it works correctly with 3 other client nodes.

Anything in your logs?

Oh and please reinstall 2.4.1. It will remove the error message you mentioned initially.

There is actually nothing in the logs at all except the updates from the
clusters.