Shard constantly initializing after a bulk index (aka the Shard Dance)

Bulk loaded 28m documents to a new index with 0 replicas and with
refresh disabled using ES 0.19.2. After the initial import, the # of
replicas was set to 1 and reset the refresh interval to 1 second. Two
shards have been moving between 3 (of the 4) different nodes for
almost 12 hours.

Some stats:

{
cluster_name: ESCluster
status: yellow
timed_out: false
number_of_nodes: 4
number_of_data_nodes: 4
active_primary_shards: 5
active_shards: 8
relocating_shards: 0
initializing_shards: 2
unassigned_shards: 0
}

Sample visualization of two clusters states, taken seconds apart:

None of the nodes were restarted after the import. Other possible
relevant settings:

gateway.expected_nodes: 1
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.50.101", "192.168.50.102"]

Note that the unicast hosts only contain 2 out of the 4 hosts. Seems
to have worked well so far. The cluster is in development and none of
the settings (number of shards/replicas/nodes) have been tuned.
Working primarily on the import/search flow.

Cheers,

Ivan

Sorry, it was user error in the end. Ran out of disk space. Moved logs
into a new location and was tailing the old ones.

--
Ivan

On Wed, May 9, 2012 at 9:30 AM, Ivan Brusic ivan@brusic.com wrote:

Bulk loaded 28m documents to a new index with 0 replicas and with
refresh disabled using ES 0.19.2. After the initial import, the # of
replicas was set to 1 and reset the refresh interval to 1 second. Two
shards have been moving between 3 (of the 4) different nodes for
almost 12 hours.

Some stats:

{
cluster_name: ESCluster
status: yellow
timed_out: false
number_of_nodes: 4
number_of_data_nodes: 4
active_primary_shards: 5
active_shards: 8
relocating_shards: 0
initializing_shards: 2
unassigned_shards: 0
}

Sample visualization of two clusters states, taken seconds apart:
http://i.imgur.com/bRgpZ.png

None of the nodes were restarted after the import. Other possible
relevant settings:

gateway.expected_nodes: 1
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.50.101", "192.168.50.102"]

Note that the unicast hosts only contain 2 out of the 4 hosts. Seems
to have worked well so far. The cluster is in development and none of
the settings (number of shards/replicas/nodes) have been tuned.
Working primarily on the import/search flow.

Cheers,

Ivan