Status red


(Marcin Dojwa) #1

Hi,

I have the following problem. I have 2 nodes and 1 index splited into 30
shards and 1 replica. Currently I have 12mln docs (11GB of data in ES).
When I tried to delete documents by query, it never finished and ES started
using 100% of processor. I stopped both nodes and run only one of them.
Since starting the node it uses about 125% of processor and
http://localhost:2902/_cluster/health returns:
{

  • cluster_name: "production",
  • status: "red",
  • timed_out: false,
  • number_of_nodes: 1,
  • number_of_data_nodes: 1,
  • active_primary_shards: 0,
  • active_shards: 0,
  • relocating_shards: 0,
  • initializing_shards: 4,
  • unassigned_shards: 56

}

It looks like this even whole day and ES restart does not help. Could you
help me with that ?

I have enabled all logs. There are not any gc logs generated and main log
file looks like this:
[2012-08-18 20:19:49,111][INFO ][node ] [es1]
{0.19.8}[12199]: initializing ...
[2012-08-18 20:19:49,119][INFO ][plugins ] [es1] loaded
[], sites []
[2012-08-18 20:19:55,082][INFO ][node ] [es1]
{0.19.8}[12199]: initialized
[2012-08-18 20:19:55,082][INFO ][node ] [es1]
{0.19.8}[12199]: starting ...
[2012-08-18 20:19:55,168][INFO ][transport ] [es1]
bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/
10.29.212.95:9300]}
[2012-08-18 20:19:58,208][INFO ][cluster.service ] [es1]
new_master [es1][aNZR5W2RR1G1Hjlir8pdlw][inet[/10.29.212.95:9300]]{rack_id=es1_rack},
reason: zen-disco-join (elected_as_master)
[2012-08-18 20:19:58,220][INFO ][discovery ] [es1]
production/aNZR5W2RR1G1Hjlir8pdlw
[2012-08-18 20:20:01,665][INFO ][http ] [es1]
bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/
10.29.212.95:9200]}
[2012-08-18 20:20:01,665][INFO ][node ] [es1]
{0.19.8}[12199]: started

Thank you for help.

Best regards.
Marcin Dojwa.

--


(Marcin Dojwa) #2

OK, it suddenly finished assigning shards. When I started second node it
assigned all shards too. Now it is working fine. It looks like it has a
problem when I started both nodes at the same time, it could not assign the
shards somehow. So, there is no problem anymore. Sorry :slight_smile:

Best regards.
Marcin Dojwa.

2012/8/18 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I have the following problem. I have 2 nodes and 1 index splited into 30
shards and 1 replica. Currently I have 12mln docs (11GB of data in ES).
When I tried to delete documents by query, it never finished and ES started
using 100% of processor. I stopped both nodes and run only one of them.
Since starting the node it uses about 125% of processor and
http://localhost:2902/_cluster/health returns:
{

  • cluster_name: "production",
  • status: "red",
  • timed_out: false,
  • number_of_nodes: 1,
  • number_of_data_nodes: 1,
  • active_primary_shards: 0,
  • active_shards: 0,
  • relocating_shards: 0,
  • initializing_shards: 4,
  • unassigned_shards: 56

}

It looks like this even whole day and ES restart does not help. Could you
help me with that ?

I have enabled all logs. There are not any gc logs generated and main log
file looks like this:
[2012-08-18 20:19:49,111][INFO ][node ] [es1]
{0.19.8}[12199]: initializing ...
[2012-08-18 20:19:49,119][INFO ][plugins ] [es1] loaded
[], sites []
[2012-08-18 20:19:55,082][INFO ][node ] [es1]
{0.19.8}[12199]: initialized
[2012-08-18 20:19:55,082][INFO ][node ] [es1]
{0.19.8}[12199]: starting ...
[2012-08-18 20:19:55,168][INFO ][transport ] [es1]
bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/
10.29.212.95:9300]}
[2012-08-18 20:19:58,208][INFO ][cluster.service ] [es1]
new_master [es1][aNZR5W2RR1G1Hjlir8pdlw][inet[/10.29.212.95:9300]]{rack_id=es1_rack},
reason: zen-disco-join (elected_as_master)
[2012-08-18 20:19:58,220][INFO ][discovery ] [es1]
production/aNZR5W2RR1G1Hjlir8pdlw
[2012-08-18 20:20:01,665][INFO ][http ] [es1]
bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/
10.29.212.95:9200]}
[2012-08-18 20:20:01,665][INFO ][node ] [es1]
{0.19.8}[12199]: started

Thank you for help.

Best regards.
Marcin Dojwa.

--


(David Pilato) #3

Yes. I think it's a best practice to start new nodes only when the cluster is stable (all shards ok).

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 août 2012 à 20:31, Marcin Dojwa m.dojwa@livechatinc.com a écrit :

OK, it suddenly finished assigning shards. When I started second node it assigned all shards too. Now it is working fine. It looks like it has a problem when I started both nodes at the same time, it could not assign the shards somehow. So, there is no problem anymore. Sorry :slight_smile:

Best regards.
Marcin Dojwa.

2012/8/18 Marcin Dojwa m.dojwa@livechatinc.com
Hi,

I have the following problem. I have 2 nodes and 1 index splited into 30 shards and 1 replica. Currently I have 12mln docs (11GB of data in ES). When I tried to delete documents by query, it never finished and ES started using 100% of processor. I stopped both nodes and run only one of them. Since starting the node it uses about 125% of processor and http://localhost:2902/_cluster/health returns:
{

cluster_name: "production",

status: "red",

timed_out: false,

number_of_nodes: 1,

number_of_data_nodes: 1,

active_primary_shards: 0,

active_shards: 0,

relocating_shards: 0,

initializing_shards: 4,

unassigned_shards: 56

}

It looks like this even whole day and ES restart does not help. Could you help me with that ?

I have enabled all logs. There are not any gc logs generated and main log file looks like this:
[2012-08-18 20:19:49,111][INFO ][node ] [es1] {0.19.8}[12199]: initializing ...
[2012-08-18 20:19:49,119][INFO ][plugins ] [es1] loaded [], sites []
[2012-08-18 20:19:55,082][INFO ][node ] [es1] {0.19.8}[12199]: initialized
[2012-08-18 20:19:55,082][INFO ][node ] [es1] {0.19.8}[12199]: starting ...
[2012-08-18 20:19:55,168][INFO ][transport ] [es1] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/10.29.212.95:9300]}
[2012-08-18 20:19:58,208][INFO ][cluster.service ] [es1] new_master [es1][aNZR5W2RR1G1Hjlir8pdlw][inet[/10.29.212.95:9300]]{rack_id=es1_rack}, reason: zen-disco-join (elected_as_master)
[2012-08-18 20:19:58,220][INFO ][discovery ] [es1] production/aNZR5W2RR1G1Hjlir8pdlw
[2012-08-18 20:20:01,665][INFO ][http ] [es1] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/10.29.212.95:9200]}
[2012-08-18 20:20:01,665][INFO ][node ] [es1] {0.19.8}[12199]: started

Thank you for help.

Best regards.
Marcin Dojwa.

--

--


(system) #4