Hello,
I've been semi-frequently getting the error below after stopping and
starting my single machine cluster. I started paying attention to doc
counts and saw that I lost ~250 out of ~300,000 after the restart
which, I guess, were the ones in the corrupted trans log.
At the time of the shutdown, I was not actively indexing or searching.
My current usage pattern is pretty much constant content at ~20 docs
per minute entering the system and various boolean queries I'm testing
with from a single thread.
On startup, the cluster stayed in red for ~15 minutes, probably while
trying to restore the trans log. My gateway is the local fs.
I'm running a config with 49 indexes w/ 5 shards each on a single
machine, which may be pushing something over the edge. We're
evaluating a single machine before moving to the distributed aspect
and was hoping not to have to rebuild to add extra shards.
Any ideas what could be the cause? If there is any further information
I could provide, I'd be happy to.
I'll enable debug and see if I can capture anything there. Memory and
CPU utilization at the time seem fine.
Thanks
[23:35:49,980][WARN ][index.gateway.fs ] [Loss] [index01][4]
failed to retrieve translog after [1608] operations, ignoring the
rest, considered corrupted
java.io.EOFException
at
org.elasticsearch.common.io.stream.BytesStreamInput.readByte(BytesStreamInput.java:
78)
at
org.elasticsearch.common.io.stream.StreamInput.readVInt(StreamInput.java:
73)
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway
$3.onPartial(BlobStoreIndexShardGateway.java:416)
at
org.elasticsearch.common.blobstore.fs.AbstractFsBlobContainer
$1.run(AbstractFsBlobContainer.java:82)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)