I just reproduced the problem again. Here's more information.
Basically, while indexing, I get several of these
[17:15:49,010][WARN ][org.apache.hadoop.hdfs.DFSClient] DataStreamer
Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/my/path/es/elasticsearch/indices/myentity/2/translog/translog-0 could only
be replicated to 0 nodes, instead of 1
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
at org.apache.hadoop.ipc.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy14.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy14.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
[17:15:49,019][WARN ][org.apache.hadoop.hdfs.DFSClient] Error Recovery for
block null bad datanode[0] nodes == null
[17:15:49,020][WARN ][org.apache.hadoop.hdfs.DFSClient] Could not get block
locations. Source file
"/my/path/es/elasticsearch/indices/myentity/2/translog/translog-0" -
Aborting...
[17:15:49,020][WARN ][index.gateway ] [Powerhouse][myentity][2]
Failed to snapshot (scheduled)
org.elasticsearch.index.gateway.IndexShardGatewaySnapshotFailedException:
[myentity][2] Failed to snapshot translog into [null]
at
org.elasticsearch.index.gateway.hdfs.HdfsIndexShardGateway.snapshot(HdfsIndexShardGateway.java:239)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.snapshot(IndexShardGatewayService.java:179)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.snapshot(IndexShardGatewayService.java:175)
at
org.elasticsearch.index.engine.robin.RobinEngine.snapshot(RobinEngine.java:364)
at
org.elasticsearch.index.shard.service.InternalIndexShard.snapshot(InternalIndexShard.java:377)
at
org.elasticsearch.index.gateway.IndexShardGatewayService.snapshot(IndexShardGatewayService.java:175)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$SnapshotRunnable.run(IndexShardGatewayService.java:257)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/my/path/es/elasticsearch/indices/myentity/2/translog/translog-0 could only
be replicated to 0 nodes, instead of 1
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
at org.apache.hadoop.ipc.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy14.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy14.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
After which, I try to access my index' _terms (
http://localhost:9200/myentity/_terms) and get a
{
_shards: {
- total: 5
- successful: 2
- failed: 3
- -
failures: [
- -
{
- index: "myentity"
- shard: 4
- reason: "BroadcastShardOperationFailedException[[myentity][4]
No active shard(s)]"
}
- -
{
- index: "myentity"
- shard: 3
- reason: "BroadcastShardOperationFailedException[[myentity][3]
No active shard(s)]"
}
- -
{
- index: "myentity"
- shard: 0
- reason: "BroadcastShardOperationFailedException[[myentity][0]
No active shard(s)]"
}
]
}
After restarting elasticsearch & hadoop (not sure if this is what triggered
it) , my other indices will suffer from that
BroadcastShardOperationFailedException
as well (the shard that will get a BroadcastShardOperationFailedException is
random, but is not limited to the index that failed during indexing).
Any ideas what's happening here?
Thanks,
--
Franz Allan Valencia See | Java Software Engineer
franz.see@gmail.com
LinkedIn: http://www.linkedin.com/in/franzsee
Twitter: http://www.twitter.com/franz_see
On Sat, Jul 17, 2010 at 7:01 PM, Franz Allan Valencia See <
franz.see@gmail.com> wrote:
Things like:
org.elasticsearch.transport.RemoteTransportException:
[Icemaster][inet[/10.0.6.1:9300]][indices/index/shard/index]
Caused by: org.elasticsearch.action.PrimaryNotStartedActionException:
[application][1] Timeout waiting for [1m]
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$4.onTimeout(TransportShardReplicationOperationAction.java:311)
at
org.elasticsearch.cluster.service.InternalClusterService$1$1.run(InternalClusterService.java:87)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by:
org.elasticsearch.action.PrimaryNotStartedActionException: [application][1]
Timeout waiting for [1m]
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$4.onTimeout(TransportShardReplicationOperationAction.java:311)
at
org.elasticsearch.cluster.service.InternalClusterService$1$1.run(InternalClusterService.java:87)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Or Some Shard Broadcast exception, Or something like cannot update an index
because something is already updating it.
I'll post more stacktraces once I get them.
Thanks,
--
Franz Allan Valencia See | Java Software Engineer
franz.see@gmail.com
LinkedIn: http://www.linkedin.com/in/franzsee
Twitter: http://www.twitter.com/franz_see
On Sat, Jul 17, 2010 at 4:21 PM, Shay Banon shay.banon@elasticsearch.comwrote:
In general you should not get exception while indexing (unless you index
something wrong). What type of exceptions do you get? Remind me again on
which version you are?
-shay.banon
On Sat, Jul 17, 2010 at 4:03 AM, Franz Allan Valencia See <
franz.see@gmail.com> wrote:
I'm no longer getting that 'Too many open files' but the problem is still
there. Basically, if something goes wrong while I'm indexing ona particular
index, all my indices gets corrupted.
Sometimes, to fix it, I even have to reformat my hdfs node.
On Fri, Jul 16, 2010 at 7:20 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:
If it relates to the max open files exception, then your system is not
in a good state. Up the max open files and try again.
On Fri, Jul 16, 2010 at 1:03 PM, Franz Allan Valencia See <
franz.see@gmail.com> wrote:
I've been playing with Elasticsearch for about 3 weeks now. So far,
everything has been great. But lately, I started trying to index all the
data in the tables in my targeting (instead of just partially as what I've
been doing to evaluate Elasticsearch).
Currently, I store these replicated tables in different indices. That's
because after those individual replications, I will go through all of them
and do my joining in my application to index the resulting object trees.
I am able to index most of my tables, one of them has 500,000+ records.
But this one table that I have, contains about 6.5M rows. What I am doing is
to query this table, get a scrollable ResultSet, iterate over it and index
them one by one via the Java API. The whole process for replicating this
particular table takes about 1.5 hr.
However, most of the time, I will get an error during indexing (i.e.
IOException on Elasticsearch, NoShardException,
org.apache.hadoop.ipc.RemoteException (because I am using hdfs gateway),
etc). But that's ok with me, as long as I can resume from where I left of,
then I'll just recover manually (for now). However, when something goes
wrong during this indexing, all my indices will suddenly get corrupted.
Either I will have something like a NoShardException on most of
them, or the indices will totally disappear.
I even try to do a http://localhost:9200/_flush after a successful
replication, but it still doesn't solve the problem.
Any ideas where I went wrong?
Thanks,
Franz Allan Valencia See | Java Software Engineer
franz.see@gmail.com
LinkedIn: http://www.linkedin.com/in/franzsee
Twitter: http://www.twitter.com/franz_see
--
Franz Allan Valencia See | Java Software Engineer
franz.see@gmail.com
LinkedIn: http://www.linkedin.com/in/franzsee
Twitter: http://www.twitter.com/franz_see