HELP! My production is dead

Eugene_Strokin · July 1, 2012, 2:24am

Hello,
Suddenly, ES start being unresponsive from time to time.
I've decided to restart it.
But it stack at initializing:
[2012-06-30 21:01:49,080][INFO ][bootstrap ] max_open_files
[29977]
[2012-06-30 21:01:49,089][INFO ][node ] [Piledriver]
{0.19.0}[6554]: initializing ...
[2012-06-30 21:01:49,093][INFO ][plugins ] [Piledriver]
loaded [], sites []

I'm waiting for 20 minutes for initializing complete. This is relatively
small index (around 1 GB). It shouldn't take that much time.

I see that Java process takes some CPU, but not 100% (20-40 % in average).

What it could be?
Should I start to upload my back ups?

Thank you

pedro · July 1, 2012, 2:30am

/etc/init.d/ntp stop; date; date date +"%m%d%H%M%C%y.%S"; date;
/etc/init.d/ntp start

Stupid leap second

On Sun Jul 1 03:24:01 2012, Eugene Strokin wrote:

Hello,
Suddenly, ES start being unresponsive from time to time.
I've decided to restart it.
But it stack at initializing:
[2012-06-30 21:01:49,080][INFO ][bootstrap ] max_open_files [29977]
[2012-06-30 21:01:49,089][INFO ][node ] [Piledriver] {0.19.0}[6554]:
initializing ...
[2012-06-30 21:01:49,093][INFO ][plugins ] [Piledriver] loaded ,
sites

I'm waiting for 20 minutes for initializing complete. This is
relatively small index (around 1 GB). It shouldn't take that much time.

I see that Java process takes some CPU, but not 100% (20-40 % in average).

What it could be?
Should I start to upload my back ups?

Thank you

Eugene_Strokin · July 1, 2012, 2:32am

Thank you Pedro,
but what does it mean really?
How it helps?

On Saturday, June 30, 2012 10:30:00 PM UTC-4, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date date +"%m%d%H%M%C%y.%S"; date;
/etc/init.d/ntp start

Stupid leap second

On Sun Jul 1 03:24:01 2012, Eugene Strokin wrote:

Hello,
Suddenly, ES start being unresponsive from time to time.
I've decided to restart it.
But it stack at initializing:
[2012-06-30 21:01:49,080][INFO ][bootstrap ] max_open_files [29977]
[2012-06-30 21:01:49,089][INFO ][node ] [Piledriver] {0.19.0}[6554]:
initializing ...
[2012-06-30 21:01:49,093][INFO ][plugins ] [Piledriver] loaded ,
sites

I'm waiting for 20 minutes for initializing complete. This is
relatively small index (around 1 GB). It shouldn't take that much time.

I see that Java process takes some CPU, but not 100% (20-40 % in
average).

What it could be?
Should I start to upload my back ups?

Thank you

pedro · July 1, 2012, 2:41am

Who cares as long as it works?

On Sun Jul 1 03:32:13 2012, Eugene Strokin wrote:

Thank you Pedro,
but what does it mean really?
How it helps?

On Saturday, June 30, 2012 10:30:00 PM UTC-4, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;
/etc/init.d/ntp start


Stupid leap second


On Sun Jul 1 03:24:01 2012, Eugene Strokin wrote:
> Hello,
> Suddenly, ES start being unresponsive from time to time.
> I've decided to restart it.
> But it stack at initializing:
> [2012-06-30 21:01:49,080][INFO ][bootstrap ] max_open_files [29977]
> [2012-06-30 21:01:49,089][INFO ][node ] [Piledriver]
{0.19.0}[6554]:
> initializing ...
> [2012-06-30 21:01:49,093][INFO ][plugins ] [Piledriver] loaded [],
> sites []
>
> I'm waiting for 20 minutes for initializing complete. This is
> relatively small index (around 1 GB). It shouldn't take that
much time.
>
> I see that Java process takes some CPU, but not 100% (20-40 % in
average).
>
> What it could be?
> Should I start to upload my back ups?
>
> Thank you
>

Eugene_Strokin · July 1, 2012, 2:56am

Hmm,.. I don't know if this helped, but after about 30 minutes ES got
started and it works now...
My system doesn't have /etc/init.d/ntp file, so I doubt the command was the
reason why it's started.

I'm actualy very surprised it took so much time. Could something be done
about this? I also noticed that there are about 1700 files for node 0 of
the index. Other nodes have something around 200 files.
I guess this is wrong, but I'm not sure about this.
Could it be fixed somehow? Or is this wrong at all?

Anyway, if someone had such problems, please share. Any information is
helpful.

Thank you,
Eugene S.

On Saturday, June 30, 2012 10:41:57 PM UTC-4, Pedro Alves wrote:

Who cares as long as it works?

On Sun Jul 1 03:32:13 2012, Eugene Strokin wrote:

Thank you Pedro,
but what does it mean really?
How it helps?

On Saturday, June 30, 2012 10:30:00 PM UTC-4, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date; 
/etc/init.d/ntp start 


Stupid leap second 


On Sun Jul 1 03:24:01 2012, Eugene Strokin wrote: 
> Hello, 
> Suddenly, ES start being unresponsive from time to time. 
> I've decided to restart it. 
> But it stack at initializing: 
> [2012-06-30 21:01:49,080][INFO ][bootstrap ] max_open_files

[29977]

> [2012-06-30 21:01:49,089][INFO ][node ] [Piledriver] 
{0.19.0}[6554]: 
> initializing ... 
> [2012-06-30 21:01:49,093][INFO ][plugins ] [Piledriver] loaded [], 
> sites [] 
> 
> I'm waiting for 20 minutes for initializing complete. This is 
> relatively small index (around 1 GB). It shouldn't take that 
much time. 
> 
> I see that Java process takes some CPU, but not 100% (20-40 % in 
average). 
> 
> What it could be? 
> Should I start to upload my back ups? 
> 
> Thank you 
>

dostrow · July 1, 2012, 3:34am

On Saturday, June 30, 2012 7:30:00 PM UTC-7, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date date +"%m%d%H%M%C%y.%S"; date;
/etc/init.d/ntp start

Stupid leap second

THANK YOU!!! This just fixed my cluster than went down today, I would NEVER
have thought of this. The cluster was taking literally hours to restart the
nodes, run this, everything restarts right away. WTF...

Eugene_Strokin · July 2, 2012, 6:17pm

Apparently it was the problem:
http://www.google.com/hostednews/afp/article/ALeqM5iW5Bq-w6vhpZZR2XItR4EBJbbxPw?docId=CNG.b993298106a8bbaafb124651a0577fd2.1d1
Still not clear why it caused the problem with long start up time, but as
Pedro said, as long as it works)))

On Saturday, June 30, 2012 11:34:52 PM UTC-4, dostrow wrote:

On Saturday, June 30, 2012 7:30:00 PM UTC-7, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date date +"%m%d%H%M%C%y.%S"; date;
/etc/init.d/ntp start

Stupid leap second

THANK YOU!!! This just fixed my cluster than went down today, I would
NEVER have thought of this. The cluster was taking literally hours to
restart the nodes, run this, everything restarts right away. WTF...

llowder · July 2, 2012, 7:02pm

Leap seconds caused a LOT of problems over the weekend.

Long story short (if I understand things right), there is a bug in ntpd
that causes the CPU to lock anytime anything needs a timestamp when there
is an 'active' leapsecond.

The fix is either to stop ntpd before the leapsecond is applied, the
commands mentioned, or a server restart. Sometimes, a combination of these
is needed.

On Monday, July 2, 2012 1:17:02 PM UTC-5, Eugene Strokin wrote:

Apparently it was the problem:

http://www.google.com/hostednews/afp/article/ALeqM5iW5Bq-w6vhpZZR2XItR4EBJbbxPw?docId=CNG.b993298106a8bbaafb124651a0577fd2.1d1
Still not clear why it caused the problem with long start up time, but as
Pedro said, as long as it works)))

On Saturday, June 30, 2012 11:34:52 PM UTC-4, dostrow wrote:

On Saturday, June 30, 2012 7:30:00 PM UTC-7, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date date +"%m%d%H%M%C%y.%S"; date;
/etc/init.d/ntp start

Stupid leap second

THANK YOU!!! This just fixed my cluster than went down today, I would
NEVER have thought of this. The cluster was taking literally hours to
restart the nodes, run this, everything restarts right away. WTF...

Pavel_Dvorak · July 4, 2012, 8:08pm

My cluster also crashed few days ago so it was probably this. It was weird,
it stopped respoding so I restarted it. It restarted fine and it seems no
data were lost.

On Monday, July 2, 2012 8:17:02 PM UTC+2, Eugene Strokin wrote:

Apparently it was the problem:

http://www.google.com/hostednews/afp/article/ALeqM5iW5Bq-w6vhpZZR2XItR4EBJbbxPw?docId=CNG.b993298106a8bbaafb124651a0577fd2.1d1
Still not clear why it caused the problem with long start up time, but as
Pedro said, as long as it works)))

On Saturday, June 30, 2012 11:34:52 PM UTC-4, dostrow wrote:

On Saturday, June 30, 2012 7:30:00 PM UTC-7, Pedro Alves wrote:

/etc/init.d/ntp stop; date; date date +"%m%d%H%M%C%y.%S"; date;
/etc/init.d/ntp start

Stupid leap second

THANK YOU!!! This just fixed my cluster than went down today, I would
NEVER have thought of this. The cluster was taking literally hours to
restart the nodes, run this, everything restarts right away. WTF...

Topic		Replies	Views
Repairing effects of careless Elasticsearch administration Elasticsearch	2	464	October 8, 2014
Instant crash on startup Elasticsearch	14	5180	July 4, 2012
High CPU usage on a thread, ES doesn't respond anymore Elasticsearch	2	615	November 10, 2011
ES instability Elasticsearch	7	489	June 23, 2014
Interesting Issue Elasticsearch	6	468	June 15, 2014

HELP! My production is dead

Related topics