I posted this via Nabble, as well, and it hasn't gone through, so if
this shows up twice, my apologies...
http://elasticsearch-users.115913.n3.nabble.com/file/n1062891/elasticsearch.log
http://elasticsearch-users.115913.n3.nabble.com/file/n1062891/elasticsearch.yml
I've attached a log capturing the issue (this log starts at a fresh
creation and the error occurred on my first restart of the cluster).
Also, to re-iterate, I am probably using way to many shards (5*49)
than make sense in my current single node config.
I was only able to capture after reducing logging from DEBUG to INFO,
not sure if that is pertinent or a red herring, but could indicate a
timing issue.
As a side note, it seems that my current cluster start up time is
dependent on the amount of transactions in the trans log. When these
are large, the start up is delayed applying these transactions. Can
take 15+ minutes to start back up.
I will be happy to provide any other details needed. I was able to
copy off one of the corrupted trans logs, but it is quite large.
If there is anything else I can provide, please let me know.
Thanks!!!
On Aug 8, 4:06 pm, Paul ppea...@gmail.com wrote:
Oddly, now that I have set the log level to DEBUG, I haven't been able
to reproduce over the past couple of days. Will revert back to the
default and see if that gets it to kick back in.
Just running a single node, all fs based indexes. No exceptions
writing the translog.
Let me get a good capture of this occurring and I will post further
details including my config.
Thanks,
Paul
On Aug 8, 3:25 pm, Shay Banon shay.ba...@elasticsearch.com wrote:
You probably got it right, but can you post your config? How many nodes do
you start on the same machine? Do you use memory based indices, since the
recovery should be quick if you use fs based index storage, since it should
be reused.
Have you see any exceptions trying to write the translog?
-shay.banon
On Sat, Aug 7, 2010 at 4:35 AM, Paul ppea...@gmail.com wrote:
Hello,
I've been semi-frequently getting the error below after stopping and
starting my single machine cluster. I started paying attention to doc
counts and saw that I lost ~250 out of ~300,000 after the restart
which, I guess, were the ones in the corrupted trans log.
At the time of the shutdown, I was not actively indexing or searching.
My current usage pattern is pretty much constant content at ~20 docs
per minute entering the system and various boolean queries I'm testing
with from a single thread.
On startup, the cluster stayed in red for ~15 minutes, probably while
trying to restore the trans log. My gateway is the local fs.
I'm running a config with 49 indexes w/ 5 shards each on a single
machine, which may be pushing something over the edge. We're
evaluating a single machine before moving to the distributed aspect
and was hoping not to have to rebuild to add extra shards.
Any ideas what could be the cause? If there is any further information
I could provide, I'd be happy to.
I'll enable debug and see if I can capture anything there. Memory and
CPU utilization at the time seem fine.
Thanks
[23:35:49,980][WARN ][index.gateway.fs ] [Loss] [index01][4]
failed to retrieve translog after [1608] operations, ignoring the
rest, considered corrupted
java.io.EOFException
at
org.elasticsearch.common.io.stream.BytesStreamInput.readByte(BytesStreamInp ut.java:
78)
at
org.elasticsearch.common.io.stream.StreamInput.readVInt(StreamInput.java:
73)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway
$3.onPartial(BlobStoreIndexShardGateway.java:416)
at
org.elasticsearch.common.blobstore.fs.AbstractFsBlobContainer
$1.run(AbstractFsBlobContainer.java:82)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)