Kibana flapping between red and green


(Sukrit Dasgupta) #1

Hi List,

Running an ES2.1 cluster with 4 nodes and Kibana 4.3. I am constantly seeing this when Kibana starts up:

  log   [09:39:20.395] [error][status][plugin:elasticsearch] Status changed from green to red - Request Timeout after 1500ms
  log   [09:39:24.367] [info][status][plugin:elasticsearch] Status changed from red to green - Kibana index ready
  log   [09:39:40.273] [error][status][plugin:elasticsearch] Status changed from green to red - Request Timeout after 1500ms
  log   [09:39:42.808] [info][status][plugin:elasticsearch] Status changed from red to green - Kibana index ready
  log   [09:40:15.028] [error][status][plugin:elasticsearch] Status changed from green to red - Request Timeout after 1500ms
  log   [09:40:17.557] [info][status][plugin:elasticsearch] Status changed from red to green - Kibana index ready
  log   [09:40:33.631] [error][status][plugin:elasticsearch] Status changed from green to red - Request Timeout after 1500ms
  log   [09:40:36.181] [info][status][plugin:elasticsearch] Status changed from red to green - Kibana index ready

Except one index, which is red, every other index on my ES cluster is green.. and its serving graphs to a Grafana endpoint just fine.

ES2 does not have Shield.

Any thoughts? or pointers? For starters, I have increased some time out values in kibana.yml.

Thanks


(Mark Walkom) #2

Anything in the ES logs?


(Sukrit Dasgupta) #3

Hi @warkolm,

Wanted to get back to you after updating my infra.

My ES infra now has 8 data nodes, 2 client nodes and 3 master nodes. Kibana 4.3 is pointing to one of the client nodes.

I still see the same message and Kibana connection is flapping with the new infra changes.

I removed deleted 'red' indices and now, I have:

{
  "cluster_name" : "live",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 13,
  "number_of_data_nodes" : 8,
  "active_primary_shards" : 1734,
  "active_shards" : 5188,
  "relocating_shards" : 2,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 3,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 1366,
  "active_shards_percent_as_number" : 100.0
}

I have made sure nothing is populating ES except Marvel (which is what I wanted to use Kibana for).

When I shut down my LS2 instance, I did see some shard allocation tracebacks. But will now observe for other logs. I am able to reach the Kibana's /app/marvel URL but the data there does not seem right there. For example, I see all the servers in the cluster show Shards as 0. Perhaps something else that I will investigate and post in the Marvel group, but holding off pending further investigation.

Thanks.


(Sukrit Dasgupta) #4

Interesting observation. I moved Kibana to one of my client nodes and dont see the issue anymore. Perhaps a network issue?


(Sukrit Dasgupta) #5

Update: spoke too soon. Same issue persists. So dont think its a network issue between the Kibana and the client nodes. Its something else.

State of my cluster:

{
  "cluster_name" : "sln-live",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 14,
  "number_of_data_nodes" : 9,
  "active_primary_shards" : 2075,
  "active_shards" : 6210,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Marvel was showing me cluster state, but now I see Marvel is not sending Kibana anything. All dashboards are empty.

Any help/thoughts? Place to investigate?


(Mark Walkom) #6

Are all your nodes on the same network?


(Sukrit Dasgupta) #7

Yeah, they are on the same network. Different subnets though, connected through more than one L2 switches.

Thanks.


(Mark Walkom) #8

Then chances are your network is flaky.
It may be worth zen increasing timeouts a little to see if it helps.


(Lukaslehmann) #9

Hi mad_min

We had the same issue, the following solution with setting the memory limitation properly for the kibana node process seems to fix it:

Best regards
Lukas


(Sukrit Dasgupta) #10

Thanks Lukas,

This has been running flawlessly thanks to your pointer.

We got a bit worried because we took apart parts of our network to figure out if there was any issue with spanning tree or routing loops and didnt find any.

Thanks


(system) #11