Hi,
Version used - 0.90.3
I was doing some performance benchmark, hence thought to load Elastic
search with 50 Million documents, each of 2k size. After some time, it
started giving error, the cluster went in RED COLOR.
Caused by: org.elasticsearch.transport.RemoteTransportException:
[52][inet[/10.3.176.22:9300]][index/shard/recovery/fileChunk]
Caused by: java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
....
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:37:36,716][WARN ][cluster.action.shard ] [52] sending
failed shard for [dw][2], node[tpzZTSz0R8yI_EU-faH1nA], [R],
s[INITIALIZING], reason [Failed to start shard, message
[RecoveryFailedException[[dw][2]: Recovery failed from
[53][Nox6jON3TIy1_H0Oe_YTxQ][inet[/10.3.176.133:9300]]{master=true} into
[52][tpzZTSz0R8yI_EU-faH1nA][inet[/10.3.176.22:9300]]{master=true}];
nested:
RemoteTransportException[[53][inet[/10.3.176.133:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[dw][2] Phase[1] Execution failed]; nested:
RecoverFilesRecoveryException[[dw][2] Failed to transfer [199] files with
total size of [844.3mb]]; nested:
RemoteTransportException[[52][inet[/10.3.176.22:9300]][index/shard/recovery/fileChunk]];
nested:
FileNotFoundException[/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)]; ]]
This is caused by NO SPACE LEFT ON THE DISK. It is fine, this error
happened YESTERDAY somewhere NIGHT TIME. I STOPPED THE LOAD WHEN I SAW IT.
The problem is - Elastic search is CONTINUOUSLY BALANCING THE CLUSTER, I
COULD SEE IT IS CONSTANTLY moving the SHARDS here and there. IT LOOKS TO ME
IT HAS ENTERED IN TO SOME INFINITE LOOP.
IT CONSTANTLY SPITS THIS ON SOME NODES -
[2013-09-25 14:42:53,478][WARN ][index.engine.robin ] [51] [dw][3]
failed to read latest segment infos on flush
java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/3/index/_dzg.si
(Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at
org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:410)
..at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:42:54,176][WARN ][indices.cluster ] [51] [dw][2]
failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException: [dw][2]:
Recovery failed from
[53][Nox6jON3TIy1_H0Oe_YTxQ][inet[/10.3.176.133:9300]]{master=true} into
[51][hK3pwE8IQxu8RPlyBLtZ1Q][inet[/10.3.176.140:9300]]{master=true}
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
.. at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[53][inet[/10.3.176.133:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [dw][2]
Phase[1] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1125)
.. at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by:
org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [dw][2]
Failed to transfer [199] files with total size of [844.3mb]
at
org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:226)
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1118)
... 9 more
Caused by: org.elasticsearch.transport.RemoteTransportException:
[51][inet[/10.3.176.140:9300]][index/shard/recovery/fileChunk]
Caused by: java.io.FileNotFoundException:
/var/elasticsearch/elasticsearch-0.90.3/data/elasticsearch5/nodes/0/indices/dw/2/index/_rdf_es090_0.doc
(No space left on device)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
.... at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
[2013-09-25 14:42:54,198][WARN ][cluster.action.shard ] [51] sending
failed shard for [dw][2],
Q1) NORMALLY, in other databases i have seen if there is A SPACE ISSUE, the
Database just ignores the transaction and if the load is stopped it is
still in STABLE CONDITION (like no SPITTING OF ERRORS). NOT SURE WHAT IS
ELASTIC SEARCH doing here, and WHY?
Q2) HOW DO I BRING THE CLUSTER IN STABLE STATE NOW?
Thanks
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.