I have a 5 node EC2 setup, index+logs+work is on the ephemeral storage
at /data, which eleasticsearch binaries are started from a smaller EBS
boot volume.  During recovery I get an error on recovery that the file
is too big to transfer due to running out of disk space, but where is
it writing this chunk since the /data directory has plenty of space.
Config:
index:
store:
type: niofs
number_of_shards: 15
number_of_replicas: 2
path:
data: /data/elasticsearch/data
work: /data/elasticsearch/work
logs: /data/elasticsearch/logs
gateway:
type: s3
s3:
bucket: nameremoved-elasticsearch
recovery_after_nodes: 3
recovery_after_time: 5m
expected_nodes: 5
plugins: [mobz/elasticsearch-head, lukas-vlcek/bigdesk]
discovery:
type: ec2
cloud:
aws:
access_key: 
secret_key: 
cluster:
name: elasticsearch
Log output:
[2011-11-07 21:08:34,274][WARN ][cluster.action.shard     ] [Banshee]
received shard failed for [media][8], node[_Swg_4NsRru28Ppp3_1mFg],
[R], s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[Index Shard [media][8]: Recovery failed from
[Banshee][7U-Kn7aASQy1TGPEo0aWew][inet[/10.214.137.156:9300]] into
[Andromeda][_Swg_4NsRru28Ppp3_1mFg][inet[/10.201.214.240:9300]]];
nested: RemoteTransportException[[Banshee][inet[/10.214.137.156:9300]]
[index/shard/recovery/startRecovery]]; nested:
RecoveryEngineException[[media][8] Phase[1] Execution failed]; nested:
RecoverFilesRecoveryException[[media][8] Failed to transfer [200]
files with total size of [532.5mb]]; nested:
RemoteTransportException[[Andromeda][inet[/10.201.214.240:9300]][index/
shard/recovery/fileChunk]]; nested: IOException[No space left on
device]; ]]
[2011-11-07 21:08:34,804][WARN ][cluster.action.shard     ] [Banshee]
received shard failed for [media][11], node[_Swg_4NsRru28Ppp3_1mFg],
[R], s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[Index Shard [media][11]: Recovery failed from
[Gorr][BVIFqg9sTjSwcpQg9-AqlQ][inet[/10.72.57.189:9300]] into
[Andromeda][_Swg_4NsRru28Ppp3_1mFg][inet[/10.201.214.240:9300]]];
nested: RemoteTransportException[[Gorr][inet[/10.72.57.189:9300]]
[index/shard/recovery/startRecovery]]; nested:
RecoveryEngineException[[media][11] Phase[1] Execution failed];
nested: RecoverFilesRecoveryException[[media][11] Failed to transfer
[190] files with total size of [537mb]]; nested:
RemoteTransportException[[Andromeda][inet[/10.201.214.240:9300]][index/
shard/recovery/fileChunk]]; nested: IOException[No space left on
device]; ]]
^C
jminard@domU-12-31-39-0B-86-52:/data/elasticsearch/logs$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             7.9G  7.4G  172M  98% /
none                  3.7G  112K  3.7G   1% /dev
none                  3.8G     0  3.8G   0% /dev/shm
none                  3.8G   52K  3.8G   1% /var/run
none                  3.8G     0  3.8G   0% /var/lock
none                  3.8G     0  3.8G   0% /lib/init/rw
/dev/sdb              414G  199M  393G   1% /mnt