What happends if I change master: true, data: true node to master-only node?

I have 1 master/data node and 2 data-only nodes

I would like to change master/data node to master-only node due to OutOfMemoryError.

Q. What happends to the data already stored in master/data node?
Would they be transferred to other 2 data-only nodes?
Would the replica shards already in 2 data-only nodes will be used to restore?

I quite need an urgent answer. :sob:

You ideally want to have 3 master eligible nodes in a cluster for increased stability, so I would recommend making all nodes master/data nodes if possible. Running with a single master node creates a single point of failure.

I had a memory pressure issue with master/data node.
10g heap size failed out when garbage collector is running.
Data-only nodes seem working okay.

So I was trying to convert master/data node to dedicated master node in order to decrease the pressure.

If I can round-robin the master node tasks to 3 different master/data nodes I would prefer to change them. But I don't think it is possible.

Nor I cannot have more than 3 nodes. :sob:

Which version of Elasticsearch are you using? What is the specification of your Elasticsearch nodes/cluster? How much data/indices/shards do you have in the cluster? What type of data do you have in the cluster?

It's ElasticSearch v.2.2.0

{AWS EC2 m4.xlarge : 4 vCPUs, 16 GiB Mem} x 3

node-1: master/data
node-2: data-only
node-3: data-only

ES_HEAP_SIZE: 10g

Data: 100,000,000 docs
Indices: 13000
Shards: 3 / index
Replica: 1

p.s. Changing master/data to master-only node didn't help any. I am still getting the following.

[2016-02-21 22:03:47,104][WARN ][monitor.jvm              ] [Hulk] [gc][old][18835][175] duration [3m], collections [5]/[3m], total [3m]/[1h], memory [9.9gb]->[9.9gb]/[9.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [33.2mb]->[33.2mb]/[33.2mb]}{[old] [9.6gb]->[9.6gb]/[9.6gb]}
[2016-02-21 22:31:27,605][WARN ][transport.netty          ] [Hulk] exception caught on transport layer [[id: 0x180573ba, /172.31.7.130:40806 => /172.31.9.135:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
        at org.apache.lucene.util.CharsRefBuilder.<init>(CharsRefBuilder.java:35)
        at org.elasticsearch.common.io.stream.StreamInput.<init>(StreamInput.java:246)

Only master node blows up while data nodes are okay.
What's happening in master's memory? :sob:

Shards are not free and carries a certain amount of overhead with respect to memory and file handles. With that many indices, the cluster state is also likely to be quite large and use up a fair amount of memory.

Having 78000 (if I count correctly) shards is way, way too many for a cluster of that size and specification, and will use up a lot of memory. I recommend you rethink your indexing/sharding strategy in order to dramatically reduce the number of shards in the cluster.

Thank you very much for your answer.

I have been creating indices as daily log style, e.g. my-index-001-20160221 for handy management. And 001 is ranging from 001 ~ 100.

I guess this is a bad indexing strategy for memory usage.

Q. What if I disable norm for the fields? I am only filtering and aggregating for many of the fields. Would this help solving memory issue?

Q. What could be the side effect if I change my index pattern to be my-index-001-YYYYMM? This will reduce the number of shards by 1/30-ish but I would need to update/change daily docs when a day ends.

Why are you creating 100 indices per day? Your shards at the moment appear to be extremely small, containing an average of less than 3000 documents if I calculated correctly, which is very inefficient. Without knowing the answer to my initial question, I would recommend creating a single monthly index with 3 shards. Why would this force you to to update/change daily docs when a day ends?

I have separated indices by users. So more users will create more index per day. And I chose such strategy in order to handle when I no longer need a single user's data.
Also, If I put all users data in a single index, there is going to be more search/aggregation requests to a single index. And I was worrying about this would give me slow responses.

And I daily removes and backup to S3 one field that contains a large value (in order to save some storage space).

This will generally scale badly. You can use a single index for all users and use routing by the customer ID for both indexing and querying to ensure just a single shard per index need to be accessed. This can help with query throughput and latency.

Elasticsearch can handle a lot of data in an index/shard and still return quick responses, so I would recommend benchmarking it. Having larger shards will also allow Elasticsearch to more efficiently compress you data on disk.

Thank you very much.

If I had to choose either.

  1. Monthly index for every user. eg. my-index-001-YYYYMM
  2. Daily index for all users. eg. my-index-YYYYMMDD
  3. Monthly index for all users. eg. my-index-YYYYMM

You would recommend 2 rather than 1. And 3 at the best? Or 2 should be fine to solve my issue.

Reducing the number of indices by a factor of 100 would be a big win. Whether to go for monthly or daily indices depend on how long your retention period is. If it is shorter than a couple of months I would consider daily indices. For longer retention periods monthly indices will allow your cluster to handle larger amounts of data.

That is WAY WAY WAY WAY too many shards.
You really need to reduce them as Christian mentioned.

I had to stop the cluster since it was unresponsive.
And I've deleted half of my index directories, then restarted the cluster.
Odd thing I notice from restarting the cluster. The number of pending tasks increase very fast.

And throws the following after an hour even before cluster health go green.

[2016-02-22 02:43:18,508][WARN ][monitor.jvm ] [Hulk] [gc][old][4055][48] duration [30.4s], collections [1]/[31.4s], total [30.4s]/[37.4s], memory [9.8gb]->[9.6gb]/[9.9gb], all_pools {[young] [207mb]->[7.6mb]/[266.2mb]}{[survivor] [33.2mb]->[0b]/[33.2mb]}{[old] [9.6gb]->[9.6gb]/[9.6gb]}

Q. Could a lot of pending tasks can cause GC to fail and make the cluster unresponsive?
(This time I don't see OOM error.)

See our previous comments.

Force-deleting half of index directories didn't help making the cluster back alive.
I tried doubling the master node's memory, but it still dies.

My initial plan was to make the cluster back alive and re-indexing everything.
(with my-index-YYYYMM pattern)

Since the cluster doesn't easily get back on, I plan to create new cluster and find a way to restore data from the dead cluster. (Not sure this is possible at the moment.)