I have a 5 node EC2 setup, index+logs+work is on the ephemeral storage
at /data, which eleasticsearch binaries are started from a smaller EBS
boot volume. During recovery I get an error on recovery that the file
is too big to transfer due to running out of disk space, but where is
it writing this chunk since the /data directory has plenty of space.
Config:
index:
store:
type: niofs
number_of_shards: 15
number_of_replicas: 2
path:
data: /data/elasticsearch/data
work: /data/elasticsearch/work
logs: /data/elasticsearch/logs
gateway:
type: s3
s3:
bucket: nameremoved-elasticsearch
recovery_after_nodes: 3
recovery_after_time: 5m
expected_nodes: 5
plugins: [mobz/elasticsearch-head, lukas-vlcek/bigdesk]
discovery:
type: ec2
cloud:
aws:
access_key:
secret_key:
cluster:
name: elasticsearch
Log output:
[2011-11-07 21:08:34,274][WARN ][cluster.action.shard ] [Banshee]
received shard failed for [media][8], node[_Swg_4NsRru28Ppp3_1mFg],
[R], s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[Index Shard [media][8]: Recovery failed from
[Banshee][7U-Kn7aASQy1TGPEo0aWew][inet[/10.214.137.156:9300]] into
[Andromeda][_Swg_4NsRru28Ppp3_1mFg][inet[/10.201.214.240:9300]]];
nested: RemoteTransportException[[Banshee][inet[/10.214.137.156:9300]]
[index/shard/recovery/startRecovery]]; nested:
RecoveryEngineException[[media][8] Phase[1] Execution failed]; nested:
RecoverFilesRecoveryException[[media][8] Failed to transfer [200]
files with total size of [532.5mb]]; nested:
RemoteTransportException[[Andromeda][inet[/10.201.214.240:9300]][index/
shard/recovery/fileChunk]]; nested: IOException[No space left on
device]; ]]
[2011-11-07 21:08:34,804][WARN ][cluster.action.shard ] [Banshee]
received shard failed for [media][11], node[_Swg_4NsRru28Ppp3_1mFg],
[R], s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[Index Shard [media][11]: Recovery failed from
[Gorr][BVIFqg9sTjSwcpQg9-AqlQ][inet[/10.72.57.189:9300]] into
[Andromeda][_Swg_4NsRru28Ppp3_1mFg][inet[/10.201.214.240:9300]]];
nested: RemoteTransportException[[Gorr][inet[/10.72.57.189:9300]]
[index/shard/recovery/startRecovery]]; nested:
RecoveryEngineException[[media][11] Phase[1] Execution failed];
nested: RecoverFilesRecoveryException[[media][11] Failed to transfer
[190] files with total size of [537mb]]; nested:
RemoteTransportException[[Andromeda][inet[/10.201.214.240:9300]][index/
shard/recovery/fileChunk]]; nested: IOException[No space left on
device]; ]]
^C
jminard@domU-12-31-39-0B-86-52:/data/elasticsearch/logs$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 7.9G 7.4G 172M 98% /
none 3.7G 112K 3.7G 1% /dev
none 3.8G 0 3.8G 0% /dev/shm
none 3.8G 52K 3.8G 1% /var/run
none 3.8G 0 3.8G 0% /var/lock
none 3.8G 0 3.8G 0% /lib/init/rw
/dev/sdb 414G 199M 393G 1% /mnt