Sharding unbalance problem


(Jae) #1

Hi

Today, my elasticsearch 0.19.9 cluster was down because of 'no free space'.
I am seeing shards are not allocated with balance. My question is how to
monitor free space of each instance and what the best method is to prevent
'no free space', just deleting old index frequently and checking free space
in each instance?

When I chekced the free space percentage using 'df -H' command, many
instances were showing less than 70% disk usage but one instance got no
empty space. Please give me the guideline regarding free space monitoring.

Thank you
Best, Jae

--


(Ivan Brusic) #2

Are you nodes not equal in terms of disk capacity? Do you have different
indices with different shard counts?

ElasticSearch assumes every node in the cluster are heterogeneous. Shard
allocation is based purely on placing shards in an even distribution among
the nodes. The size of the disk, the number of shards per index, and the
number of indices do not matter. If one index's shard size is bigger than
another index's shard size, then the shard allocator could potentially put
the more heavyweight shards together on one machine.

ElasticSearch does not provide any monitoring tools that provide
notification (perhaps SPM does?). You should setup monitoring software on
your machines for OS level stats (CPU/mem/disk).

Cheers,

Ivan

On Sat, Sep 29, 2012 at 6:57 PM, Jae metacret@gmail.com wrote:

Hi

Today, my elasticsearch 0.19.9 cluster was down because of 'no free
space'. I am seeing shards are not allocated with balance. My question is
how to monitor free space of each instance and what the best method is to
prevent 'no free space', just deleting old index frequently and checking
free space in each instance?

When I chekced the free space percentage using 'df -H' command, many
instances were showing less than 70% disk usage but one instance got no
empty space. Please give me the guideline regarding free space monitoring.

Thank you
Best, Jae

--

--


(Jae) #3

On Monday, October 1, 2012 10:11:03 AM UTC-7, Ivan Brusic wrote:

Are you nodes not equal in terms of disk capacity? Do you have different
indices with different shard counts?

No, every node has equal disk space. Every index has the same shard count
and replication count.

I deleted several indices but only one instance is consuming 68%. The
following is the result of 'df -H' for each instance.

FYI, I am using TransportClient and sample and ping interval is 60s.

i-1c797166
/dev/md0 1.9T 939G 865G 53% /mnt
i-12797168
/dev/md0 1.9T 938G 867G 52% /mnt
i-1079716a
/dev/md0 1.9T 812G 992G 46% /mnt
i-1679716c
/dev/md0 1.9T 938G 867G 52% /mnt
i-7c7c7406
/dev/md0 1.9T 813G 992G 46% /mnt
i-727c7408
/dev/md0 1.9T 839G 966G 47% /mnt
i-767c740c
/dev/md0 1.9T 1.3T 591G 68% /mnt
i-747c740e
/dev/md0 1.9T 1.1T 740G 60% /mnt
i-327f7748
/dev/md0 1.9T 812G 992G 46% /mnt
i-307f774a
/dev/md0 1.9T 1.1T 740G 60% /mnt
i-347f774e
/dev/md0 1.9T 1.1T 740G 59% /mnt
i-26747c5c
/dev/md0 1.9T 938G 866G 53% /mnt
i-08747c72
/dev/md0 1.9T 1.1T 740G 59% /mnt
i-0e747c74
/dev/md0 1.9T 813G 992G 46% /mnt
i-327f7748
/dev/md0 1.9T 812G 992G 46% /mnt
i-307f774a
/dev/md0 1.9T 1.1T 740G 60% /mnt
i-347f774e
/dev/md0 1.9T 1.1T 740G 59% /mnt
i-26747c5c
/dev/md0 1.9T 938G 866G 53% /mnt
i-08747c72
/dev/md0 1.9T 1.1T 740G 59% /mnt
i-0e747c74
/dev/md0 1.9T 813G 992G 46% /mnt
i-04747c7e
/dev/md0 1.9T 813G 992G 46% /mnt
i-ea767e90
/dev/md0 1.9T 938G 866G 52% /mnt
i-e8767e92
/dev/md0 1.9T 839G 966G 47% /mnt
i-ec767e96
/dev/md0 1.9T 812G 992G 46% /mnt
i-e2767e98
/dev/md0 1.9T 938G 866G 53% /mnt
i-2a080050
/dev/md0 1.9T 1.1T 740G 59% /mnt
i-2e080054
/dev/md0 1.9T 1.1T 740G 59% /mnt
i-2c080056
/dev/md0 1.9T 939G 866G 53% /mnt
i-2608005c
/dev/md0 1.9T 1.2T 641G 65% /mnt
i-f40a028e
/dev/md0 1.9T 1.1T 740G 59% /mnt
i-ea0a0290
/dev/md0 1.9T 938G 866G 53% /mnt
i-ee0a0294
/dev/md0 1.9T 1.1T 740G 59% /mnt
i-360d054c
/dev/md0 1.9T 938G 866G 53% /mnt
i-340d054e
/dev/md0 1.9T 938G 866G 53% /mnt
i-280d0552
/dev/md0 1.9T 813G 992G 46% /mnt
i-2e0d0554
/dev/md0 1.9T 938G 866G 53% /mnt
i-720f0708
/dev/md0 1.9T 812G 992G 46% /mnt
i-700f070a
/dev/md0 1.9T 939G 866G 53% /mnt
i-6c0f0716
/dev/md0 1.9T 938G 866G 52% /mnt
i-620f0718
/dev/md0 1.9T 764G 1.1T 43% /mnt
i-44f8ec3e
/dev/md0 1.9T 939G 866G 53% /mnt
i-838a79fe
/dev/md0 1.9T 813G 992G 46% /mnt

Thank you
Best, Jae

ElasticSearch assumes every node in the cluster are heterogeneous. Shard
allocation is based purely on placing shards in an even distribution among
the nodes. The size of the disk, the number of shards per index, and the
number of indices do not matter. If one index's shard size is bigger than
another index's shard size, then the shard allocator could potentially put
the more heavyweight shards together on one machine.

ElasticSearch does not provide any monitoring tools that provide
notification (perhaps SPM does?). You should setup monitoring software on
your machines for OS level stats (CPU/mem/disk).

Cheers,

Ivan

On Sat, Sep 29, 2012 at 6:57 PM, Jae <meta...@gmail.com <javascript:>>wrote:

Hi

Today, my elasticsearch 0.19.9 cluster was down because of 'no free
space'. I am seeing shards are not allocated with balance. My question is
how to monitor free space of each instance and what the best method is to
prevent 'no free space', just deleting old index frequently and checking
free space in each instance?

When I chekced the free space percentage using 'df -H' command, many
instances were showing less than 70% disk usage but one instance got no
empty space. Please give me the guideline regarding free space monitoring.

Thank you
Best, Jae

--

--


(Otis Gospodnetić) #4

Hello,

Correct, if you use SPM for ES (or any other flavour of SPM for that
matter) you can monitor the disk space and set an alert when its usage
reaches N%, say 90%, so you can be automatically notified instead of having
to proactively watch disk space. For more info about SPM, check the URL in
the sig or email off-list.

On the ES side, the new ES has a mechanism to manually moved shards around
the cluster, which may be handy for you here.

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Monday, October 1, 2012 1:11:03 PM UTC-4, Ivan Brusic wrote:

Are you nodes not equal in terms of disk capacity? Do you have different
indices with different shard counts?

ElasticSearch assumes every node in the cluster are heterogeneous. Shard
allocation is based purely on placing shards in an even distribution among
the nodes. The size of the disk, the number of shards per index, and the
number of indices do not matter. If one index's shard size is bigger than
another index's shard size, then the shard allocator could potentially put
the more heavyweight shards together on one machine.

ElasticSearch does not provide any monitoring tools that provide
notification (perhaps SPM does?). You should setup monitoring software on
your machines for OS level stats (CPU/mem/disk).

Cheers,

Ivan

On Sat, Sep 29, 2012 at 6:57 PM, Jae <meta...@gmail.com <javascript:>>wrote:

Hi

Today, my elasticsearch 0.19.9 cluster was down because of 'no free
space'. I am seeing shards are not allocated with balance. My question is
how to monitor free space of each instance and what the best method is to
prevent 'no free space', just deleting old index frequently and checking
free space in each instance?

When I chekced the free space percentage using 'df -H' command, many
instances were showing less than 70% disk usage but one instance got no
empty space. Please give me the guideline regarding free space monitoring.

Thank you
Best, Jae

--

--


(phill) #5

On 10/1/2012 5:55 PM, Otis Gospodnetic wrote:

On the ES side, the new ES has a mechanism to manually moved shards
around the cluster, which may be handy for you here.

New? What version? This sounds interesting. Is it something in the
index "module" realm?

-Paul

--


(Paul Smith) #6

0.19.10 released recently has the Cluster Reroute API added:

On 4 October 2012 07:07, P. Hill parehill1@gmail.com wrote:

On 10/1/2012 5:55 PM, Otis Gospodnetic wrote:

On the ES side, the new ES has a mechanism to manually moved shards
around the cluster, which may be handy for you here.

New? What version? This sounds interesting. Is it something in the
index "module" realm?

-Paul

--

--


(system) #7