HELP! My production is dead


(Eugene Strokin) #1

Hello,
Suddenly, ES start being unresponsive from time to time.
I've decided to restart it.
But it stack at initializing:
[2012-06-30 21:01:49,080][INFO ][bootstrap ] max_open_files
[29977]
[2012-06-30 21:01:49,089][INFO ][node ] [Piledriver]
{0.19.0}[6554]: initializing ...
[2012-06-30 21:01:49,093][INFO ][plugins ] [Piledriver]
loaded [], sites []

I'm waiting for 20 minutes for initializing complete. This is relatively
small index (around 1 GB). It shouldn't take that much time.

I see that Java process takes some CPU, but not 100% (20-40 % in average).

What it could be?
Should I start to upload my back ups?

Thank you


(pedro) #2

/etc/init.d/ntp stop; date; date date +"%m%d%H%M%C%y.%S"; date;
/etc/init.d/ntp start

Stupid leap second

On Sun Jul 1 03:24:01 2012, Eugene Strokin wrote:

Hello,
Suddenly, ES start being unresponsive from time to time.
I've decided to restart it.
But it stack at initializing:
[2012-06-30 21:01:49,080][INFO ][bootstrap ] max_open_files [29977]
[2012-06-30 21:01:49,089][INFO ][node ] [Piledriver] {0.19.0}[6554]:
initializing ...
[2012-06-30 21:01:49,093][INFO ][plugins ] [Piledriver] loaded [],
sites []

I'm waiting for 20 minutes for initializing complete. This is
relatively small index (around 1 GB). It shouldn't take that much time.

I see that Java process takes some CPU, but not 100% (20-40 % in average).

What it could be?
Should I start to upload my back ups?

Thank you


(Eugene Strokin) #3

Thank you Pedro,
but what does it mean really?
How it helps?

On Saturday, June 30, 2012 10:30:00 PM UTC-4, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date date +"%m%d%H%M%C%y.%S"; date;
/etc/init.d/ntp start

Stupid leap second

On Sun Jul 1 03:24:01 2012, Eugene Strokin wrote:

Hello,
Suddenly, ES start being unresponsive from time to time.
I've decided to restart it.
But it stack at initializing:
[2012-06-30 21:01:49,080][INFO ][bootstrap ] max_open_files [29977]
[2012-06-30 21:01:49,089][INFO ][node ] [Piledriver] {0.19.0}[6554]:
initializing ...
[2012-06-30 21:01:49,093][INFO ][plugins ] [Piledriver] loaded [],
sites []

I'm waiting for 20 minutes for initializing complete. This is
relatively small index (around 1 GB). It shouldn't take that much time.

I see that Java process takes some CPU, but not 100% (20-40 % in
average).

What it could be?
Should I start to upload my back ups?

Thank you


(pedro) #4

Who cares as long as it works? :stuck_out_tongue:

On Sun Jul 1 03:32:13 2012, Eugene Strokin wrote:

Thank you Pedro,
but what does it mean really?
How it helps?

On Saturday, June 30, 2012 10:30:00 PM UTC-4, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;
/etc/init.d/ntp start


Stupid leap second


On Sun Jul 1 03:24:01 2012, Eugene Strokin wrote:
> Hello,
> Suddenly, ES start being unresponsive from time to time.
> I've decided to restart it.
> But it stack at initializing:
> [2012-06-30 21:01:49,080][INFO ][bootstrap ] max_open_files [29977]
> [2012-06-30 21:01:49,089][INFO ][node ] [Piledriver]
{0.19.0}[6554]:
> initializing ...
> [2012-06-30 21:01:49,093][INFO ][plugins ] [Piledriver] loaded [],
> sites []
>
> I'm waiting for 20 minutes for initializing complete. This is
> relatively small index (around 1 GB). It shouldn't take that
much time.
>
> I see that Java process takes some CPU, but not 100% (20-40 % in
average).
>
> What it could be?
> Should I start to upload my back ups?
>
> Thank you
>

(Eugene Strokin) #5

Hmm,.. I don't know if this helped, but after about 30 minutes ES got
started and it works now...
My system doesn't have /etc/init.d/ntp file, so I doubt the command was the
reason why it's started.

I'm actualy very surprised it took so much time. Could something be done
about this? I also noticed that there are about 1700 files for node 0 of
the index. Other nodes have something around 200 files.
I guess this is wrong, but I'm not sure about this.
Could it be fixed somehow? Or is this wrong at all?

Anyway, if someone had such problems, please share. Any information is
helpful.

Thank you,
Eugene S.

On Saturday, June 30, 2012 10:41:57 PM UTC-4, Pedro Alves wrote:

Who cares as long as it works? :stuck_out_tongue:

On Sun Jul 1 03:32:13 2012, Eugene Strokin wrote:

Thank you Pedro,
but what does it mean really?
How it helps?

On Saturday, June 30, 2012 10:30:00 PM UTC-4, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date; 
/etc/init.d/ntp start 


Stupid leap second 


On Sun Jul 1 03:24:01 2012, Eugene Strokin wrote: 
> Hello, 
> Suddenly, ES start being unresponsive from time to time. 
> I've decided to restart it. 
> But it stack at initializing: 
> [2012-06-30 21:01:49,080][INFO ][bootstrap ] max_open_files 

[29977]

> [2012-06-30 21:01:49,089][INFO ][node ] [Piledriver] 
{0.19.0}[6554]: 
> initializing ... 
> [2012-06-30 21:01:49,093][INFO ][plugins ] [Piledriver] loaded [], 
> sites [] 
> 
> I'm waiting for 20 minutes for initializing complete. This is 
> relatively small index (around 1 GB). It shouldn't take that 
much time. 
> 
> I see that Java process takes some CPU, but not 100% (20-40 % in 
average). 
> 
> What it could be? 
> Should I start to upload my back ups? 
> 
> Thank you 
> 

(dostrow) #6

On Saturday, June 30, 2012 7:30:00 PM UTC-7, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date date +"%m%d%H%M%C%y.%S"; date;
/etc/init.d/ntp start

Stupid leap second

THANK YOU!!! This just fixed my cluster than went down today, I would NEVER
have thought of this. The cluster was taking literally hours to restart the
nodes, run this, everything restarts right away. WTF...


(Eugene Strokin) #7

Apparently it was the problem:
http://www.google.com/hostednews/afp/article/ALeqM5iW5Bq-w6vhpZZR2XItR4EBJbbxPw?docId=CNG.b993298106a8bbaafb124651a0577fd2.1d1
Still not clear why it caused the problem with long start up time, but as
Pedro said, as long as it works)))

On Saturday, June 30, 2012 11:34:52 PM UTC-4, dostrow wrote:

On Saturday, June 30, 2012 7:30:00 PM UTC-7, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date date +"%m%d%H%M%C%y.%S"; date;
/etc/init.d/ntp start

Stupid leap second

THANK YOU!!! This just fixed my cluster than went down today, I would
NEVER have thought of this. The cluster was taking literally hours to
restart the nodes, run this, everything restarts right away. WTF...


(llowder@oreillyauto.com) #8

Leap seconds caused a LOT of problems over the weekend.

Long story short (if I understand things right), there is a bug in ntpd
that causes the CPU to lock anytime anything needs a timestamp when there
is an 'active' leapsecond.

The fix is either to stop ntpd before the leapsecond is applied, the
commands mentioned, or a server restart. Sometimes, a combination of these
is needed.

On Monday, July 2, 2012 1:17:02 PM UTC-5, Eugene Strokin wrote:

Apparently it was the problem:

http://www.google.com/hostednews/afp/article/ALeqM5iW5Bq-w6vhpZZR2XItR4EBJbbxPw?docId=CNG.b993298106a8bbaafb124651a0577fd2.1d1
Still not clear why it caused the problem with long start up time, but as
Pedro said, as long as it works)))

On Saturday, June 30, 2012 11:34:52 PM UTC-4, dostrow wrote:

On Saturday, June 30, 2012 7:30:00 PM UTC-7, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date date +"%m%d%H%M%C%y.%S"; date;
/etc/init.d/ntp start

Stupid leap second

THANK YOU!!! This just fixed my cluster than went down today, I would
NEVER have thought of this. The cluster was taking literally hours to
restart the nodes, run this, everything restarts right away. WTF...


(Pavel Dvorak) #9

My cluster also crashed few days ago so it was probably this. It was weird,
it stopped respoding so I restarted it. It restarted fine and it seems no
data were lost.

On Monday, July 2, 2012 8:17:02 PM UTC+2, Eugene Strokin wrote:

Apparently it was the problem:

http://www.google.com/hostednews/afp/article/ALeqM5iW5Bq-w6vhpZZR2XItR4EBJbbxPw?docId=CNG.b993298106a8bbaafb124651a0577fd2.1d1
Still not clear why it caused the problem with long start up time, but as
Pedro said, as long as it works)))

On Saturday, June 30, 2012 11:34:52 PM UTC-4, dostrow wrote:

On Saturday, June 30, 2012 7:30:00 PM UTC-7, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date date +"%m%d%H%M%C%y.%S"; date;
/etc/init.d/ntp start

Stupid leap second

THANK YOU!!! This just fixed my cluster than went down today, I would
NEVER have thought of this. The cluster was taking literally hours to
restart the nodes, run this, everything restarts right away. WTF...


(system) #10