Restarting ES and slow recovery


(David Loehr) #1

I'm running two elasticsearch nodes (0.18.6), with unicast discovery,
5 shards and 1 replica per index, 747 indices, and 2 types per index.
I restarted elasticsearch on each server (waited about a minute
between them), and checked the cluster state. It was red for 30-40
minutes, then yellow for about an hour, with the "unassigned_shards"
count slowly decreasing. At first, it was recovering 120 shards per
minute, and then it slowed to 12 shards per minute. With approximately
7400 shards, it would take hours to fully recover.

I tried running my application server (Play Framework) during the
recovery process, and found that search results would usually come
back ok, but whenever I tried to index new data, elasticsearch
wouldn't respond. I had the same results with curl -- searching
worked, but elasticsearch didn't respond when trying to add data (or
create an index).

Is it normal for elasticsearch to take so long to recover? How can I
make it faster? Is it normal for ES to not respond to PUT requests
while it's recovering?

My elasticsearch configuration file:
cluster:
name: elasticsearch

network:
host: eth0:ipv4

discovery:
zen:
ping:
multicast:
enabled: false
unicast:
hosts: "server1.example.com:
9300,server2.example.com:9300"

--


(David Pilato) #2

You are running 7470 shards with 2 nodes only (3735 per node).
So you have 3735 Lucene instances running on a single box.

You can probably see many IO Waits.
How many documents do you have ?

My 2 cents.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 sept. 2012 à 19:16, David Loehr dloehr@servicetask.com a écrit :

I'm running two elasticsearch nodes (0.18.6), with unicast discovery,
5 shards and 1 replica per index, 747 indices, and 2 types per index.
I restarted elasticsearch on each server (waited about a minute
between them), and checked the cluster state. It was red for 30-40
minutes, then yellow for about an hour, with the "unassigned_shards"
count slowly decreasing. At first, it was recovering 120 shards per
minute, and then it slowed to 12 shards per minute. With approximately
7400 shards, it would take hours to fully recover.

I tried running my application server (Play Framework) during the
recovery process, and found that search results would usually come
back ok, but whenever I tried to index new data, elasticsearch
wouldn't respond. I had the same results with curl -- searching
worked, but elasticsearch didn't respond when trying to add data (or
create an index).

Is it normal for elasticsearch to take so long to recover? How can I
make it faster? Is it normal for ES to not respond to PUT requests
while it's recovering?

My elasticsearch configuration file:
cluster:
name: elasticsearch

network:
host: eth0:ipv4

discovery:
zen:
ping:
multicast:
enabled: false
unicast:
hosts: "server1.example.com:
9300,server2.example.com:9300"

--

--


(David Loehr) #3

Thanks for your quick reply David! I currently have 50,000 documents.
What's an appropriate number of shards per node? I currently have an
index for each of my users, and never need to search more than one
user's data at a time. Would it be better to have just one index and
use a filter to limit results to one user (each document has a user_id
field)? Would you recommend fewer (or more, if I reduce the number of
indices) shards per index? I see at
http://www.elasticsearch.org/guide/appendix/glossary.html#primary_shard
that the number of shards per index affects how many documents I can
store -- do you know approximately how many documents each shard can
handle?

On Sep 21, 1:26 pm, David Pilato da...@pilato.fr wrote:

You are running 7470 shards with 2 nodes only (3735 per node).
So you have 3735 Lucene instances running on a single box.

You can probably see many IO Waits.
How many documents do you have ?

My 2 cents.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 sept. 2012 à 19:16, David Loehr dlo...@servicetask.com a écrit :

I'm running two elasticsearch nodes (0.18.6), with unicast discovery,
5 shards and 1 replica per index, 747 indices, and 2 types per index.
I restarted elasticsearch on each server (waited about a minute
between them), and checked the cluster state. It was red for 30-40
minutes, then yellow for about an hour, with the "unassigned_shards"
count slowly decreasing. At first, it was recovering 120 shards per
minute, and then it slowed to 12 shards per minute. With approximately
7400 shards, it would take hours to fully recover.

I tried running my application server (Play Framework) during the
recovery process, and found that search results would usually come
back ok, but whenever I tried to index new data, elasticsearch
wouldn't respond. I had the same results with curl -- searching
worked, but elasticsearch didn't respond when trying to add data (or
create an index).

Is it normal for elasticsearch to take so long to recover? How can I
make it faster? Is it normal for ES to not respond to PUT requests
while it's recovering?

My elasticsearch configuration file:
cluster:
name: elasticsearch

network:
host: eth0:ipv4

discovery:
zen:
ping:
multicast:
enabled: false
unicast:
hosts: "server1.example.com:
9300,server2.example.com:9300"

--

--


(Robin Verlangen) #4

Maybe the _close and _open of index could help you out. Just close all
indices and open them just before you're going to search them. Close them
again after a certain time of idleness.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

2012/9/21 David Loehr dloehr@servicetask.com

Thanks for your quick reply David! I currently have 50,000 documents.
What's an appropriate number of shards per node? I currently have an
index for each of my users, and never need to search more than one
user's data at a time. Would it be better to have just one index and
use a filter to limit results to one user (each document has a user_id
field)? Would you recommend fewer (or more, if I reduce the number of
indices) shards per index? I see at
http://www.elasticsearch.org/guide/appendix/glossary.html#primary_shard
that the number of shards per index affects how many documents I can
store -- do you know approximately how many documents each shard can
handle?

On Sep 21, 1:26 pm, David Pilato da...@pilato.fr wrote:

You are running 7470 shards with 2 nodes only (3735 per node).
So you have 3735 Lucene instances running on a single box.

You can probably see many IO Waits.
How many documents do you have ?

My 2 cents.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 21 sept. 2012 à 19:16, David Loehr dlo...@servicetask.com a écrit :

I'm running two elasticsearch nodes (0.18.6), with unicast discovery,
5 shards and 1 replica per index, 747 indices, and 2 types per index.
I restarted elasticsearch on each server (waited about a minute
between them), and checked the cluster state. It was red for 30-40
minutes, then yellow for about an hour, with the "unassigned_shards"
count slowly decreasing. At first, it was recovering 120 shards per
minute, and then it slowed to 12 shards per minute. With approximately
7400 shards, it would take hours to fully recover.

I tried running my application server (Play Framework) during the
recovery process, and found that search results would usually come
back ok, but whenever I tried to index new data, elasticsearch
wouldn't respond. I had the same results with curl -- searching
worked, but elasticsearch didn't respond when trying to add data (or
create an index).

Is it normal for elasticsearch to take so long to recover? How can I
make it faster? Is it normal for ES to not respond to PUT requests
while it's recovering?

My elasticsearch configuration file:
cluster:
name: elasticsearch

network:
host: eth0:ipv4

discovery:
zen:
ping:
multicast:
enabled: false
unicast:
hosts: "server1.example.com:
9300,server2.example.com:9300"

--

--

--


(Clinton Gormley) #5

On Fri, 2012-09-21 at 10:56 -0700, David Loehr wrote:

Thanks for your quick reply David! I currently have 50,000 documents.
What's an appropriate number of shards per node? I currently have an
index for each of my users, and never need to search more than one
user's data at a time. Would it be better to have just one index and
use a filter to limit results to one user (each document has a user_id
field)? Would you recommend fewer (or more, if I reduce the number of
indices) shards per index? I see at
http://www.elasticsearch.org/guide/appendix/glossary.html#primary_shard
that the number of shards per index affects how many documents I can
store -- do you know approximately how many documents each shard can
handle?

Have a look at Shay's talk about scaling strategies. It'll be very
useful to you

http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html

clintt

--


(Otis Gospodnetić) #6

Hi David,

I think


may have the info about not allowing shard movement, which could help you.

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Friday, September 21, 2012 1:16:18 PM UTC-4, David Loehr wrote:

I'm running two elasticsearch nodes (0.18.6), with unicast discovery,
5 shards and 1 replica per index, 747 indices, and 2 types per index.
I restarted elasticsearch on each server (waited about a minute
between them), and checked the cluster state. It was red for 30-40
minutes, then yellow for about an hour, with the "unassigned_shards"
count slowly decreasing. At first, it was recovering 120 shards per
minute, and then it slowed to 12 shards per minute. With approximately
7400 shards, it would take hours to fully recover.

I tried running my application server (Play Framework) during the
recovery process, and found that search results would usually come
back ok, but whenever I tried to index new data, elasticsearch
wouldn't respond. I had the same results with curl -- searching
worked, but elasticsearch didn't respond when trying to add data (or
create an index).

Is it normal for elasticsearch to take so long to recover? How can I
make it faster? Is it normal for ES to not respond to PUT requests
while it's recovering?

My elasticsearch configuration file:
cluster:
name: elasticsearch

network:
host: eth0:ipv4

discovery:
zen:
ping:
multicast:
enabled: false
unicast:
hosts: "server1.example.com:
9300,server2.example.com:9300"

--


(system) #7