Cold storage nodes config

Hi,

I want to have some "cold storage" nodes in my ES cluster, where i will move nodes that are old and rarely searched. (And also never indexed anymore).

I already setup the cluster to have 4 hot servers and 3 cold servers.

The cold servers have tons of storage space, but obviously less RAM.

Specifically, they have 16GB of RAM, with 8GB used for heap.
They store 342 Indices with a total of 1145 shards.

My issue is that some times i'm seeing "OutOfMemoryError: Java heap space" errors in the logs which are causing the cold nodes to go down, and thus making my entire cluster red.

When there is a move of a shard or a recovery, the heap usage is rising to 99% and stays there until eventually there is an OOM error.

How can i handle those situations? I want to keep the cold servers low-end, but still make the indices available.

what is your version of ES ? you use DocValues?

I'm using ES 1.5.2, doc_values is ON.

I think the issue is the segments memory_size. I see that its using almost all the heap usage.

Is there any way to reduce that without closing indices? I don't mind that search will be much slower

you already tried optimize API with number of segments to 1 ? POST /your-index-name/_optimize?max_num_segments=1

Yes, i optimize each segment daily after its time has passed and set the # of segments to 1.

reduce the number of shards for index in cold nodes (reindexation in hot nodes to remove the number of shards (ex : 1 par index) and after move to cold nodes)

each shard is a lucene index...

But this will take a lot of time no? Is there a specific curl command i can use to do that within ES itself?

yes it takes a lot of time (using scroll is not so bad), but really help to keep the cluster stable. i had a hot and cold nodes. but i add a middle level of nodes in the cluster to do this aggregation/reduce shards task. this middle nodes are used only for this kind of operations.

in Elasticsearch 2.3 or 2.4 it will be an reindex API (this reindex operation will be easier)

You should upgrade, 1.5 is pretty old now and there are performance improvements in recent releases that will help.

However as @Camilo_Sierra mentioned, you should reindex the old indices to less shards.