Timeout on /_cat/indices on a single node


(Steve Malenfant) #1

I have a cluster of about 10 data nodes, 5 master nodes and a few client nodes.

I've got a single client node that no matter what I do, I can get the cluster state via curl, but not getting any responses back from /_cat/indices on that node. I re-installed elasticsearch and cleared a bunch of directory. I tried the same install on a different server and worked just fine.

How do we debug these kind of issues? The only thing I can see in the logs for each masters is the following :

[2016-10-20 17:11:41,189][WARN ][transport ] [node0001-master] Transport response handler not found of id [956886]
[2016-10-20 17:11:41,281][WARN ][transport ] [node0001-master] Transport response handler not found of id [956919]
[2016-10-20 17:11:41,282][WARN ][transport ] [node0001-master] Transport response handler not found of id [956930]
[2016-10-20 17:11:41,283][WARN ][transport ] [node0001-master] Transport response handler not found of id [956933]
[2016-10-20 17:11:43,808][WARN ][transport ] [node0001-master] Transport response handler not found of id [957725]
[

Timeout out query :

[me@client ~]$ curl http://localhost:9200/_cat/indices -v

  • About to connect() to localhost port 9200 (#0)
  • Trying 127.0.0.1... connected
  • Connected to localhost (127.0.0.1) port 9200 (#0)

GET /_cat/indices HTTP/1.1
User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.21 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Host: localhost:9200
Accept: /

^C

Cluster Health :

[me@client ~]$ curl http://localhost:9200/_cluster/health?pretty=true
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 18,
"number_of_data_nodes" : 10,
"active_primary_shards" : 156,
"active_shards" : 312,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

Thanks,


(Mark Walkom) #2

What version?


(Steve Malenfant) #3

2.4.0


(David Pilato) #4

Do you have some plugins?
Also try to upgrade to 2.4.1


(Steve Malenfant) #5

I actually installed 2.4.1 first which had the issue. Then I decided to
downgraded to same version as the cluster (2.4.0). Between the installed I
cleared all data directories and temp directories. I just don't know why
the specific query fails all the time. There is other queries such as
/_cat/count/index that works half the time.


(David Pilato) #6

Do you have plugins ?


(Steve Malenfant) #7

You made me thinking. But no, I only have kibana installed locally as well
which kibana has the sense plugin to be able to control via API.

For example with count... Works then doesn't...

[smalenfa@cdn1cdstats0001 installedPlugins]$ curl
http://localhost:9200/_cat/count/custom_ats_2-2016.10.16
1477162602 18:56:42 1049713714
[smalenfa@cdn1cdstats0001 installedPlugins]$ curl
http://localhost:9200/_cat/count/custom_ats_2-2016.10.18
^C
[smalenfa@cdn1cdstats0001 installedPlugins]$ curl
http://localhost:9200/_cat/count/custom_ats_2-2016.10.18
^C
[smalenfa@cdn1cdstats0001 installedPlugins]$ curl
http://localhost:9200/_cat/count/custom_ats_2-2016.10.18
^C
[smalenfa@cdn1cdstats0001 installedPlugins]$ curl
http://localhost:9200/_cat/count/custom_ats_2-2016.10.18
1477162607 18:56:47 779883228
[smalenfa@cdn1cdstats0001 instal

How do we debug ES (data) incoming queries or routing? I thought my cluster
was too busy, but then it works correctly with 3 other client nodes.


(David Pilato) #8

Anything in your logs?


(David Pilato) #9

Oh and please reinstall 2.4.1. It will remove the error message you mentioned initially.


(Steve Malenfant) #10

There is actually nothing in the logs at all except the updates from the
clusters.


(system) #11