Speeding up a re-deploy


(Johnson Johnson) #1

I did a lot of research, and didn't find anything that was 100% clear to me as far as re-deploying a cluster. I found some good tutorials, and blog posts that touch on the subject with insightful information such as disabling shard allocation, stopping an node, and then restarting. I have a static ES cluster; ie nobody or nothing is writing to it at the moment. However, when I re-deploy from scratch it reloads all shards - I'm not sure if these are coming from the s3 repo they were restored from, or if the master is rebalancing.

_cluster/allocation/explain - shows that shards are unassigned.
_cluster/health - shows the primary shard and active shard count increasing as they are distributed across the data nodes.

I have a large number of small indices (not my design) that have 5 shards each and 1 replica, and two data nodes using persistent EBS storage. Here is my current status after a re-deploy:

`{
"cluster_name": "elasticsearch",
"status": "red",
"timed_out": false,
"number_of_nodes": 5,
"number_of_data_nodes": 2,
"active_primary_shards": 5081,
"active_shards": 5081,
"relocating_shards": 0,
"initializing_shards": 8,
"unassigned_shards": 5723,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 8,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 845,
"active_shards_percent_as_number": 46.99408065112838

}`

Side Note - this is data migrated from an AWS ES service cluster into my own ad hoc / managed cluster.

Herein lays my confusion - I thought since I am using persistent data storage that a re-deploy would be almost instantaneous as the shards are already balanced across the data nodes; ie the nodes come up and the shards are already there. This does not seem to be the case. I have tried tweaking my gateway parameters as well as outright disabling shard allocation/rebalance on re-deploy and either it rebalances with the former or does nothing (as expected) with the latter but with no shards whatsoever.

Does anyone know if there is a way to re-deploy an ES cluster, re-attach data volumes, and have the data be auto-discovered by the masters to save load time? Thanks!

Oh, yeah here is my "boot" up elasticsearch.yaml:

`cluster.name: elasticsearch

# shard things:
cluster.routing.allocation.allow_rebalance: "indices_primaries_active"
cluster.routing.allocation.enable: "none"
cluster.routing.rebalance.enable: "none"

node.data: ${NODE_DATA:true}
node.master: ${NODE_MASTER:true}
node.ingest: ${NODE_INGEST:true}
node.name: ${HOSTNAME}

network.host: 0.0.0.0

cloud:
  kubernetes:
    service: ${SERVICE}
    namespace: ${KUBERNETES_NAMESPACE}
# see https://github.com/kubernetes/kubernetes/issues/3595
bootstrap.memory_lock: ${BOOTSTRAP_MEMORY_LOCK:false}

discovery:
  zen:
    hosts_provider: kubernetes
    minimum_master_nodes: ${MINIMUM_MASTER_NODES:2}
readonlyrest:
  enable: false
# see https://www.elastic.co/guide/en/x-pack/current/xpack-settings.html
xpack.ml.enabled: false
xpack.monitoring.enabled: ${XPACK_MONITORING_ENABLED:false}
xpack.security.enabled: ${XPACK_SECURITY_ENABLED:false}
xpack.watcher.enabled: ${XPACK_WATCHER_ENABLED:false}

# see https://github.com/elastic/elasticsearch-definitive-guide/pull/679
processors: ${PROCESSORS:}

# avoid split-brain w/ a minimum consensus of two masters plus a data node
gateway.expected_master_nodes: ${EXPECTED_MASTER_NODES:3}
gateway.expected_data_nodes: ${EXPECTED_DATA_NODES:2}
gateway.recover_after_time: ${RECOVER_AFTER_TIME:3m}
gateway.recover_after_master_nodes: ${RECOVER_AFTER_MASTER_NODES:1}
gateway.recover_after_data_nodes: ${RECOVER_AFTER_DATA_NODES:1}
`

I've tried with and without the:

`  cluster.routing.allocation.allow_rebalance: "indices_primaries_active"
    cluster.routing.allocation.enable: "none"
    cluster.routing.rebalance.enable: "none"`

Yet either case it seems to rebalance, or require a rebal.


(Jakob Reiter) #2

Hi @Johnson_Johnson,

what do you mean by re-deploying? Are you spinning up a whole new cluster and then restore your data from a snapshot? Or deploy new machines/nodes and attach the EBS volumes from the previous cluster?

That's correct in terms of the data being already on that node, but to make the data available in the cluster, the shards need to be initialized first.
Looking at the output you posted, that's what is happening "initializing_shards": 8,. If this would be a re-balance they should show up under "relocating_shards": 0,.

If the data in the indices hasn't changed in the last 5 minutes (default), the indices should have a sync_id maker which allows us for extra fast recovery/init. See https://www.elastic.co/guide/en/elasticsearch/reference/6.1/indices-synced-flush.html for more details and how to manually "seal" them.

From what I'm seeing, I think the main problem is that you have way too many shards and even if Elasticsearch just spends a very short time to initialize a shard, the sum of shards makes this a long running procedure.

To speed this up, it might be worth to increase indices.recovery.max_bytes_per_sec a bit. Maybe try with 80mb, but keep an eye in IOWait to not over utilize your disks .
It's also worth to look at the settings here, esp cluster.routing.allocation.node_initial_primaries_recoveries, which could also help to speed up things.

But don't expect any magic, the main problem is that you have way too many shards and you need to take action to reduce this. I.e. by switching to 1 shard per day or even weekly/monthly indices with just one shard.
You didn't say what your use case is but from what I'm seeing I guess it's logging data and you can easily put 40GB in a single shard. If you have less than 40GB per day, think about weekly indices etc. Just aim for having shards of that size.
I also recommend reading https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

Hope that helps.
/Jakob


(Johnson Johnson) #3

Thanks, Jacob - I think I see what you mean. When I re-deploy I mean that I am deleting my data / master / client nodes, and recreating new ones but I am re-attaching the same data directories / disks to the data nodes. That is why I suspected it should take little time at all since it would be the same as the last state of the cluster.

From what I am gathering, the large number of indices causes a very long initialization time correct, which is not the same as whether the data is there or not and shards are balanced properly?


(Jakob Reiter) #4

Sorry for the late reply

Yes, that's most probably the main issue here. You can speed up things by following my advice's above (synced flush and tuning the recovery settings), but the real solution is to reduce the shard count


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.