What happends if I change master: true, data: true node to master-only node?

I have 1 master/data node and 2 data-only nodes

I would like to change master/data node to master-only node due to OutOfMemoryError.

Q. What happends to the data already stored in master/data node?
Would they be transferred to other 2 data-only nodes?
Would the replica shards already in 2 data-only nodes will be used to restore?

I quite need an urgent answer. :sob:

You ideally want to have 3 master eligible nodes in a cluster for increased stability, so I would recommend making all nodes master/data nodes if possible. Running with a single master node creates a single point of failure.

I had a memory pressure issue with master/data node.
10g heap size failed out when garbage collector is running.
Data-only nodes seem working okay.

So I was trying to convert master/data node to dedicated master node in order to decrease the pressure.

If I can round-robin the master node tasks to 3 different master/data nodes I would prefer to change them. But I don't think it is possible.

Nor I cannot have more than 3 nodes. :sob:

Which version of Elasticsearch are you using? What is the specification of your Elasticsearch nodes/cluster? How much data/indices/shards do you have in the cluster? What type of data do you have in the cluster?

It's ElasticSearch v.2.2.0

{AWS EC2 m4.xlarge : 4 vCPUs, 16 GiB Mem} x 3

node-1: master/data
node-2: data-only
node-3: data-only

ES_HEAP_SIZE: 10g

Data: 100,000,000 docs
Indices: 13000
Shards: 3 / index
Replica: 1

p.s. Changing master/data to master-only node didn't help any. I am still getting the following.

[2016-02-21 22:03:47,104][WARN ][monitor.jvm              ] [Hulk] [gc][old][18835][175] duration [3m], collections [5]/[3m], total [3m]/[1h], memory [9.9gb]->[9.9gb]/[9.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [33.2mb]->[33.2mb]/[33.2mb]}{[old] [9.6gb]->[9.6gb]/[9.6gb]}
[2016-02-21 22:31:27,605][WARN ][transport.netty          ] [Hulk] exception caught on transport layer [[id: 0x180573ba, /172.31.7.130:40806 => /172.31.9.135:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
        at org.apache.lucene.util.CharsRefBuilder.<init>(CharsRefBuilder.java:35)
        at org.elasticsearch.common.io.stream.StreamInput.<init>(StreamInput.java:246)

Only master node blows up while data nodes are okay.
What's happening in master's memory? :sob:

Shards are not free and carries a certain amount of overhead with respect to memory and file handles. With that many indices, the cluster state is also likely to be quite large and use up a fair amount of memory.

Having 78000 (if I count correctly) shards is way, way too many for a cluster of that size and specification, and will use up a lot of memory. I recommend you rethink your indexing/sharding strategy in order to dramatically reduce the number of shards in the cluster.

1 Like

Thank you very much for your answer.

I have been creating indices as daily log style, e.g. my-index-001-20160221 for handy management. And 001 is ranging from 001 ~ 100.

I guess this is a bad indexing strategy for memory usage.

Q. What if I disable norm for the fields? I am only filtering and aggregating for many of the fields. Would this help solving memory issue?

Q. What could be the side effect if I change my index pattern to be my-index-001-YYYYMM? This will reduce the number of shards by 1/30-ish but I would need to update/change daily docs when a day ends.

Why are you creating 100 indices per day? Your shards at the moment appear to be extremely small, containing an average of less than 3000 documents if I calculated correctly, which is very inefficient. Without knowing the answer to my initial question, I would recommend creating a single monthly index with 3 shards. Why would this force you to to update/change daily docs when a day ends?

I have separated indices by users. So more users will create more index per day. And I chose such strategy in order to handle when I no longer need a single user's data.
Also, If I put all users data in a single index, there is going to be more search/aggregation requests to a single index. And I was worrying about this would give me slow responses.

And I daily removes and backup to S3 one field that contains a large value (in order to save some storage space).

This will generally scale badly. You can use a single index for all users and use routing by the customer ID for both indexing and querying to ensure just a single shard per index need to be accessed. This can help with query throughput and latency.

Elasticsearch can handle a lot of data in an index/shard and still return quick responses, so I would recommend benchmarking it. Having larger shards will also allow Elasticsearch to more efficiently compress you data on disk.

1 Like

Thank you very much.

If I had to choose either.

  1. Monthly index for every user. eg. my-index-001-YYYYMM
  2. Daily index for all users. eg. my-index-YYYYMMDD
  3. Monthly index for all users. eg. my-index-YYYYMM

You would recommend 2 rather than 1. And 3 at the best? Or 2 should be fine to solve my issue.

Reducing the number of indices by a factor of 100 would be a big win. Whether to go for monthly or daily indices depend on how long your retention period is. If it is shorter than a couple of months I would consider daily indices. For longer retention periods monthly indices will allow your cluster to handle larger amounts of data.

1 Like

That is WAY WAY WAY WAY too many shards.
You really need to reduce them as Christian mentioned.

2 Likes

I had to stop the cluster since it was unresponsive.
And I've deleted half of my index directories, then restarted the cluster.
Odd thing I notice from restarting the cluster. The number of pending tasks increase very fast.

And throws the following after an hour even before cluster health go green.

[2016-02-22 02:43:18,508][WARN ][monitor.jvm ] [Hulk] [gc][old][4055][48] duration [30.4s], collections [1]/[31.4s], total [30.4s]/[37.4s], memory [9.8gb]->[9.6gb]/[9.9gb], all_pools {[young] [207mb]->[7.6mb]/[266.2mb]}{[survivor] [33.2mb]->[0b]/[33.2mb]}{[old] [9.6gb]->[9.6gb]/[9.6gb]}

Q. Could a lot of pending tasks can cause GC to fail and make the cluster unresponsive?
(This time I don't see OOM error.)

See our previous comments.

1 Like

Force-deleting half of index directories didn't help making the cluster back alive.
I tried doubling the master node's memory, but it still dies.

My initial plan was to make the cluster back alive and re-indexing everything.
(with my-index-YYYYMM pattern)

Since the cluster doesn't easily get back on, I plan to create new cluster and find a way to restore data from the dead cluster. (Not sure this is possible at the moment.)