Restarting node takes time


We have a test cluster running since 8 months.
index per day, 16 GB RAM, 1 node, 5 shards 1 replica
approximately each index is 2.5GB so 500MB per shard.
Since we have only 1 node replicas are not assigned and cluster status is yellow which is expected. When we restart the node assigning shards takes forever. We've started the node 4 days ago and there is still over 1000 shards waiting to be assigned and the cluster status is red. The data is not corrupted so in one week all shards will be assigned but this is not acceptable in production environment.
How can we avoid this issue and what are we doing wrong?
I am thinking running 2 nodes in the production environment and 4 shards per index 1 replica and asking min 16 GB RAM for the servers. But not sure if this configuration solves the issue or not?

If I calculate correctly you should have around 1200 shards, which is a lot for that amount of RAM. As your shards currently are below 1GB in size, I would recommend setting each index to have a single shard. You may also want to reindex old data into indices with just 1 shard configured.

Depending on which version you are on, upgrading might also help as you might be able to benefit from the the flushed sync functionality.

Thank you for your quick response. We will reindex the old data as soon as the cluster is up and running again. And since we will have 1 shard per index do you think the second node is necessary to keep the replicas? Also the elasticsearch version is 2.2.0

Replicas will never be assigned to the same node that holds the primary, so if you want to have two copies of your data (1 replica) you need at least 2 nodes.