Recovery using different storage location than expected for?


(jminard) #1

I have a 5 node EC2 setup, index+logs+work is on the ephemeral storage
at /data, which eleasticsearch binaries are started from a smaller EBS
boot volume. During recovery I get an error on recovery that the file
is too big to transfer due to running out of disk space, but where is
it writing this chunk since the /data directory has plenty of space.

Config:

index:
store:
type: niofs
number_of_shards: 15
number_of_replicas: 2
path:
data: /data/elasticsearch/data
work: /data/elasticsearch/work
logs: /data/elasticsearch/logs
gateway:
type: s3
s3:
bucket: nameremoved-elasticsearch
recovery_after_nodes: 3
recovery_after_time: 5m
expected_nodes: 5
plugins: [mobz/elasticsearch-head, lukas-vlcek/bigdesk]
discovery:
type: ec2
cloud:
aws:
access_key:
secret_key:
cluster:
name: elasticsearch

Log output:

[2011-11-07 21:08:34,274][WARN ][cluster.action.shard ] [Banshee]
received shard failed for [media][8], node[_Swg_4NsRru28Ppp3_1mFg],
[R], s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[Index Shard [media][8]: Recovery failed from
[Banshee][7U-Kn7aASQy1TGPEo0aWew][inet[/10.214.137.156:9300]] into
[Andromeda][_Swg_4NsRru28Ppp3_1mFg][inet[/10.201.214.240:9300]]];
nested: RemoteTransportException[[Banshee][inet[/10.214.137.156:9300]]
[index/shard/recovery/startRecovery]]; nested:
RecoveryEngineException[[media][8] Phase[1] Execution failed]; nested:
RecoverFilesRecoveryException[[media][8] Failed to transfer [200]
files with total size of [532.5mb]]; nested:
RemoteTransportException[[Andromeda][inet[/10.201.214.240:9300]][index/
shard/recovery/fileChunk]]; nested: IOException[No space left on
device]; ]]
[2011-11-07 21:08:34,804][WARN ][cluster.action.shard ] [Banshee]
received shard failed for [media][11], node[_Swg_4NsRru28Ppp3_1mFg],
[R], s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[Index Shard [media][11]: Recovery failed from
[Gorr][BVIFqg9sTjSwcpQg9-AqlQ][inet[/10.72.57.189:9300]] into
[Andromeda][_Swg_4NsRru28Ppp3_1mFg][inet[/10.201.214.240:9300]]];
nested: RemoteTransportException[[Gorr][inet[/10.72.57.189:9300]]
[index/shard/recovery/startRecovery]]; nested:
RecoveryEngineException[[media][11] Phase[1] Execution failed];
nested: RecoverFilesRecoveryException[[media][11] Failed to transfer
[190] files with total size of [537mb]]; nested:
RemoteTransportException[[Andromeda][inet[/10.201.214.240:9300]][index/
shard/recovery/fileChunk]]; nested: IOException[No space left on
device]; ]]
^C
jminard@domU-12-31-39-0B-86-52:/data/elasticsearch/logs$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 7.9G 7.4G 172M 98% /
none 3.7G 112K 3.7G 1% /dev
none 3.8G 0 3.8G 0% /dev/shm
none 3.8G 52K 3.8G 1% /var/run
none 3.8G 0 3.8G 0% /var/lock
none 3.8G 0 3.8G 0% /lib/init/rw
/dev/sdb 414G 199M 393G 1% /mnt


(Tomasz Kloc) #2

Hi

On 07.11.2011 22:15, jminard wrote:

jminard@domU-12-31-39-0B-86-52:/data/elasticsearch/logs$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 7.9G 7.4G 172M 98% /
none 3.7G 112K 3.7G 1% /dev
none 3.8G 0 3.8G 0% /dev/shm
none 3.8G 52K 3.8G 1% /var/run
none 3.8G 0 3.8G 0% /var/lock
none 3.8G 0 3.8G 0% /lib/init/rw
/dev/sdb 414G 199M 393G 1% /mnt

it seems that /data is mounted on /dev/sda1 (98% disk usage)

to be sure check df -h /data


(jminard) #3

Arg, /data was to be created as a symlink to the /mnt directory, but
the script ran the command with the wrong permissions. Thanks for
pointing out the obvious.

On Nov 7, 1:44 pm, Tomasz Kloc tomek.kloc....@gmail.com wrote:

Hi

On 07.11.2011 22:15, jminard wrote:> jminard@domU-12-31-39-0B-86-52:/data/elasticsearch/logs$ df -h

Filesystem Size Used Avail Use% Mounted on
/dev/sda1 7.9G 7.4G 172M 98% /
none 3.7G 112K 3.7G 1% /dev
none 3.8G 0 3.8G 0% /dev/shm
none 3.8G 52K 3.8G 1% /var/run
none 3.8G 0 3.8G 0% /var/lock
none 3.8G 0 3.8G 0% /lib/init/rw
/dev/sdb 414G 199M 393G 1% /mnt

it seems that /data is mounted on /dev/sda1 (98% disk usage)

to be sure check df -h /data


(system) #4