Yesterday, hard disks on one of our node went bad and we have to bring down
the physical machine which were running another 2 nodes of elasticsearch.
We have hourly index with replication 2 and 50 shards per index. Each
shards is currently 5 - 6 GB in size. It is more than 24 hrs and cluster is
still trying to assign unassign shards. During this RED status our search
is broken. Any recommandation how to handle such situations ?
Yesterday, hard disks on one of our node went bad and we have to bring
down the physical machine which were running another 2 nodes of
elasticsearch. We have hourly index with replication 2 and 50 shards per
index. Each shards is currently 5 - 6 GB in size. It is more than 24 hrs
and cluster is still trying to assign unassign shards. During this RED
status our search is broken. Any recommandation how to handle such
situations ?
Thank you for your reply. Here is our cluster info
40 Physical machines with 200 GB RAM
Each machine has 3 data nodes of ES with 30 GB RAM so total 120 data nodes.
5 dedicated master nodes.
We are using 32 RAID and 22 RAID on each physical machine,
I didn't find much in logs other than logs related to initializing shards
We do have cluster.routing.allocation.same_shard.host: true but nothing
related to rack aweareness.Something we will look into it.
Yesterday, hard disks on one of our node went bad and we have to bring
down the physical machine which were running another 2 nodes of
elasticsearch. We have hourly index with replication 2 and 50 shards per
index. Each shards is currently 5 - 6 GB in size. It is more than 24 hrs
and cluster is still trying to assign unassign shards. During this RED
status our search is broken. Any recommandation how to handle such
situations ?
What version of ES and java are you on?
Is your cluster still red? Check the _cat/allocation, _cat/indices and
_cat/recovery endpoints for info on the status of things.
Thank you for your reply. Here is our cluster info
40 Physical machines with 200 GB RAM
Each machine has 3 data nodes of ES with 30 GB RAM so total 120 data nodes.
5 dedicated master nodes.
We are using 32 RAID and 22 RAID on each physical machine,
I didn't find much in logs other than logs related to initializing shards
We do have cluster.routing.allocation.same_shard.host: true but nothing
related to rack aweareness.Something we will look into it.
Yesterday, hard disks on one of our node went bad and we have to bring
down the physical machine which were running another 2 nodes of
elasticsearch. We have hourly index with replication 2 and 50 shards per
index. Each shards is currently 5 - 6 GB in size. It is more than 24 hrs
and cluster is still trying to assign unassign shards. During this RED
status our search is broken. Any recommandation how to handle such
situations ?
What version of ES and java are you on?
Is your cluster still red? Check the _cat/allocation, _cat/indices and
_cat/recovery endpoints for info on the status of things.
Thank you for your reply. Here is our cluster info
40 Physical machines with 200 GB RAM
Each machine has 3 data nodes of ES with 30 GB RAM so total 120 data
nodes.
5 dedicated master nodes.
We are using 32 RAID and 22 RAID on each physical machine,
I didn't find much in logs other than logs related to initializing
shards
We do have cluster.routing.allocation.same_shard.host: true but nothing
related to rack aweareness.Something we will look into it.
Yesterday, hard disks on one of our node went bad and we have to bring
down the physical machine which were running another 2 nodes of
elasticsearch. We have hourly index with replication 2 and 50 shards per
index. Each shards is currently 5 - 6 GB in size. It is more than 24 hrs
and cluster is still trying to assign unassign shards. During this RED
status our search is broken. Any recommandation how to handle such
situations ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.