Elasticsearch won't rebalance (v. 2.0.1); Need to upgrade!

I inherited a 1 node cluster running on 2.0.1 that is crashing due to too many open file descriptors! (1258 indices, 6280 shards, 5 each w/no replicas).

To mitigate the load, I added a new node to the cluster but the shards are not rebalancing between nodes.

Note: it appears that there are 4 indices stuck initializing. I tried "fixing" them via the "lucene-core-2.9.3.jar org.apache.lucene.index.CheckIndex ... -fix" but i keep getting "could not read any segments file in directory". Also, some translog data may be corrupted; getting errors like "java.nio.file.NoSuchFileException: /vol/elasticsearch/elasticsearch/nodes/0/indices/logstash-2015.07.23/3/translog/translog-1437609601154.ckp"

My aim is to eventually upgrade to 6.x but this cluster is in production, and I would like to stem data loss before embarking on a live upgrade.

Advice/help appreciated!

Ok. So I manually went through and moved indices back as "primary" (i guess effectively deleting them?) and my cluster state is now yellow.

However, for every primary index I have that is green, there is a corresponding replica index that is UNASSIGNED. The /_cluster/reroute?explain shows each replica as status: CLUSTER_RECOVERED.

Now I have a two node cluster. The main original I switched from being a master to a data node. It currently contains ALL indexes. The newer "master" node has no shards, but there are index folders on disk.

I am just now learning that replica Replica shards must reside on a different host than their primary. And since I have been running a single node setup with 1 replicas enabled this entire time, there have been no replicas.

I suppose I'd like to move all the unassigned replicas to the newer, empty data node. Do I just allocate them via /reroute?

The main original I switched from being a master to a data node. It currently contains ALL indexes. The newer "master" node has no shards, but there are index folders on disk.

When you say "master", is that a logical name you are giving your infrastructure or is this the Elasticsearch master-eligible / dedicated master node concept? Also speaking of master nodes, 2 is not a great number since you need a quorum. And be sure to set minimum master nodes correctly.

In general when you have a correctly configured cluster without any explicit allocation settings, the cluster should balance itself automatically (and allocate the replicas on the other node). So while you can force a move or an allocate with /_cluster/reroute it shouldn't be necessary and the cluster will balance its allocation automatically afterwards. From the docs: "Obviously, only once all commands has been applied, the cluster will aim to be re-balance its state."

I'm not exactly sure why your data is not balancing itself, but maybe the output of /_cat/health, /_cat/nodes, and /_cat/shards/<some-example-index> can shed some light — see the docs for an explanation for the reason of unassigned shards. Feel free to post the results of the queries if you get stuck.

Thanks for the reply. I've bee hollering in the IRC channel all day!

When you say "master", is that a logical name you are giving your infrastructure or is this the Elasticsearch master-eligible / dedicated master node concept?

Originally, my second node did not join my cluster until I specified a dedicated role for each via the configuration like so:

node.data: false
node.master: true

1 as dedicated master...

node.data: true
node.master: false

and the new one as data, but I don't know if I have them configured correctly.

Then, when it finally did join, no replication occurred but it did set up the same directory structure with no indexes inside of them:

$ ls /data/elasticsearch/nodes/0/indices/logstash-2018.07.20/
drwxr-xr-x    2 elasticsearch elasticsearch  24 Jul 24 01:25 _state

Also speaking of master nodes, 2 is not a great number since you need a quorum.

My goal was to just add a node to take the load off of my crashing node, but every time I restart the node (via service), more index translogs seem to get corrupted (java.nio.file.NoSuchFileException), and I wind up deleting indexes.

And be sure to set minimum master nodes correctly.

Both configurations are set with:

discovery.zen.minimum_master_nodes: 1
cluster.routing.allocation.enable: all
cluster.routing.rebalance.enable: all                                                                                         
cluster.routing.allocation.allow_rebalance: always                                                                            
discovery.zen.ping.unicast.hosts: [xxx]
# discovery.zen.ping.unicast.hosts: [xxx] IP of second node

Once I see the second node rebalancing, I will add a new third node to the cluster.

Below are the current 2 node stats:

/cat/health

1532398709 02:18:29 elasticsearch yellow 2 1 6271 6271 0 0 6271 0 - 50.0% 

/cat/nodes

172.29.100.223 172.29.100.223 67 98 0.24 d - metis.localdomain   
172.29.100.124 172.29.100.124 41 87 0.04 - * titania.localdomain

/_cat/shards/logstash-2018.07.20

logstash-2018.07.20 1 p STARTED    5314 2.2mb 172.29.100.223 metis.localdomain 
logstash-2018.07.20 1 r UNASSIGNED                                             
logstash-2018.07.20 2 p STARTED    5213 2.1mb 172.29.100.223 metis.localdomain 
logstash-2018.07.20 2 r UNASSIGNED                                             
logstash-2018.07.20 3 p STARTED    5268 2.2mb 172.29.100.223 metis.localdomain 
logstash-2018.07.20 3 r UNASSIGNED                                             
logstash-2018.07.20 4 p STARTED    5229 2.2mb 172.29.100.223 metis.localdomain 
logstash-2018.07.20 4 r UNASSIGNED                                             
logstash-2018.07.20 0 p STARTED    5174 2.2mb 172.29.100.223 metis.localdomain 
logstash-2018.07.20 0 r UNASSIGNED                              

And all the replicas are CLUSTER_RECOVERED if you /explain them.

I'm happy to remove the master/data config to let it config itself, but since it takes forever to recover the index, I'd like to know if anyone notices something wrong?

If one of your nodes is configured as a master-only node it will not hold any indices / shards (only the cluster state). Then the state that you have makes sense — the only node that is configured with node.data: true has all the data and nothing will be replicated to the other one since it's set to node.data: false.

For your setup I would use 3 nodes, all of them master and data, and discovery.zen.minimum_master_nodes: 2 (majority of 3 is 2).

I don't have a good explanation why the nodes would only form a cluster with this specific node data and master setup. That shouldn't be necessary and maybe it was just a coincidence. You should change the settings to the following and then try to get them form a cluster:

node.data: true
node.master: true

PS: IMO IRC is good for more chat like interactions, but specific questions are easily lost or split up; for that Discuss should work better.

Thanks xeraa for your help.

Correctly assigning both nodes to data have worked. They are now rebalanced.

My issue now is a heap issued: specifically OutOfMemoryError.

My plan is to upgrade my cloud instances to the recommended 32 GB RAM instances, but I have a few questions:

  1. currently, EL&K are all on the same 1 node. How many instances of Logstash and/or kibana do i need if breaking out 3 elasticsearch nodes?
  2. does logstash load balance between hosts? ( i think yes )
  3. From the docs:

It is important to exclude dedicated master nodes from the hosts list to prevent LS from sending bulk requests to the master nodes. So this parameter should only reference either data or client nodes in Elasticsearch.
Neither node is explicitly set as a master node now (in testing), but since one of them is technically a master node, could it be causing this error when logstash starts Errno::EBADF: Bad file descriptor - Bad file descriptor?

It's a little more complicated: For Elasticsearch we recommend up to ~32GB of heap (because of compressed Oops). And you should leave as much memory for file system level caching (some use cases do well with even more). So an Elasticsearch node might have up to 64GB of memory or maybe even 96GB. Should you always use that much memory? Only if you need it, otherwise your garbage collections might be unnecessarily long. Determining how much heap you actually need will depend on your data and queries — start small and see when it blows up, increase as needed.

All of that assumes your Elasticsearch process is the only thing you are running on this node. With Logstash and Kibana in the mix it will be different. Ideally you'd split out the processes to their own machines, but for your use case that might be an overkill.

If you want to share the load across 3 instances, I'd put Elasticsearch on all 3, Logstash on 2 (for high availability and sharing the workload), and Kibana on the node without Logstash.
Let's say your instances have 32GB of RAM, then you could try 12GB of heap for Elasticsearch, 6GB of heap for Logstash, and 14GB for the operating system / caching / ... — this is a wild guess and you might need to adjust this as needed.

Logstash will load balance between your nodes automatically (round robin if I remember correctly).

Since you don't have dedicated master nodes, all your nodes are data nodes and Logstash can / should connect to all of them. Not sure about the error you are posting, but that should probably go to a new topic.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.