Rebalancing and Spark Ingest Slow Down


(Nathan) #1

Hi, I have a decently sized cluster and notice that while re-balancing my Spark ingest job crushes only the nodes with lower shard counts. It appears that the only nodes that the new shards are allocated to are the ones that are out of balance in terms of disk use compared to the rest of the cluster. This leads to a couple questions:

1 Is this type of behavior expected? This definitely makes it more challenging to add nodes to an existing cluster without impacting ongoing ingest.

2 Are there tweaks that can be done to better take advantage of more nodes when ingesting even when there are nodes that do not currently contain approximately the same amount of data as others?

NOTE: The ingest jobs ran great before the addition of nodes as well as after shards were re-balanced to the new nodes.

versions:
ES 2.2.0
Spark 1.5.2
ES Hadoop 2.2.0 (using Map/Reduce layer with PySpark)

-Nathan


(Nathan) #2

UPDATE: The following was a solution to the issue:
https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-total-shards.html#allocation-total-shards


(system) #3