Failed to retrieve transaction log - take 2

Hi,
Wanted to start a new thread to continue this discussion:
http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/381b33ab3f4fff74/2f95bacbfc864415?lnk=gst&q=eofexception#2f95bacbfc864415

After using the cluster shutdown API, the subsequent start up reported
errors retrieving the transaction log.

The single machine cluster was only a few hours old and I had indexed
~200,000 documents. I used this command to shutdown:
curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'

Here is the log file:

I am running master from last night.

There is nothing in the log that indicates anything bad has occurred
on the cluster shutdown.

Any ideas on how to prevent this or at least work around it? The
"corrupted" files get blown away before I have a chance to crack them
open.

Thanks,
Paul

So, I have managed to reproduce this and before I shutdown, I saved
copied the gateway data to have something to look at for the
"corrupted" translogs. I also have a full DEBUG log capturing this
shutdown and start up.

Unfortunately, I can't publicly post this data, but can make it
available via any private mechanism you like. Only around 2MB.

When I look at these translogs, I do find some EOF characters (0x1A)
in them at a couple of spots. The files seem to be a semi-binary
format, so maybe that is OK.

The file does seem to be complete and doesn't have a chopped off
record at the end. If I knew for sure what the characters that
separate translog operations, I'd be able to hone in more on the
specific one that is claiming failure.

Thanks,
Paul

On Aug 17, 2:39 pm, Paul ppea...@gmail.com wrote:

Hi,
Wanted to start a new thread to continue this discussion:http://groups.google.com/a/elasticsearch.com/group/users/browse_threa...

After using the cluster shutdown API, the subsequent start up reported
errors retrieving the transaction log.

The single machine cluster was only a few hours old and I had indexed
~200,000 documents. I used this command to shutdown:
curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'

Here is the log file:gist:531877 · GitHub

I am running master from last night.

There is nothing in the log that indicates anything bad has occurred
on the cluster shutdown.

Any ideas on how to prevent this or at least work around it? The
"corrupted" files get blown away before I have a chance to crack them
open.

Thanks,
Paul

Comparing the "corrupted" translog to a good one, I don't see any
substantial difference or obvious truncation.

An easy workaround for me is to flush the translogs to the indexes
before cluster shutdown, but need to ensure that I'm not actively
indexing.

Thanks

On Aug 17, 5:04 pm, Paul ppea...@gmail.com wrote:

So, I have managed to reproduce this and before I shutdown, I saved
copied the gateway data to have something to look at for the
"corrupted" translogs. I also have a full DEBUG log capturing this
shutdown and start up.

Unfortunately, I can't publicly post this data, but can make it
available via any private mechanism you like. Only around 2MB.

When I look at these translogs, I do find some EOF characters (0x1A)
in them at a couple of spots. The files seem to be a semi-binary
format, so maybe that is OK.

The file does seem to be complete and doesn't have a chopped off
record at the end. If I knew for sure what the characters that
separate translog operations, I'd be able to hone in more on the
specific one that is claiming failure.

Thanks,
Paul

On Aug 17, 2:39 pm, Paul ppea...@gmail.com wrote:

Hi,
Wanted to start a new thread to continue this discussion:http://groups.google.com/a/elasticsearch.com/group/users/browse_threa...

After using the cluster shutdown API, the subsequent start up reported
errors retrieving the transaction log.

The single machine cluster was only a few hours old and I had indexed
~200,000 documents. I used this command to shutdown:
curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'

Here is the log file:gist:531877 · GitHub

I am running master from last night.

There is nothing in the log that indicates anything bad has occurred
on the cluster shutdown.

Any ideas on how to prevent this or at least work around it? The
"corrupted" files get blown away before I have a chance to crack them
open.

Thanks,
Paul

I think I managed to find the problem, and pushed a fix for it:
Gateway: Failure to read full translog from the gateway · Issue #328 · elastic/elasticsearch · GitHub. Note, this
menas that upcoming 0.9.1 will require a flush then shutdown operation to
clean the translog when upgrading.

-shay.banon

On Wed, Aug 18, 2010 at 10:27 AM, Paul ppearcy@gmail.com wrote:

Comparing the "corrupted" translog to a good one, I don't see any
substantial difference or obvious truncation.

An easy workaround for me is to flush the translogs to the indexes
before cluster shutdown, but need to ensure that I'm not actively
indexing.

Thanks

On Aug 17, 5:04 pm, Paul ppea...@gmail.com wrote:

So, I have managed to reproduce this and before I shutdown, I saved
copied the gateway data to have something to look at for the
"corrupted" translogs. I also have a full DEBUG log capturing this
shutdown and start up.

Unfortunately, I can't publicly post this data, but can make it
available via any private mechanism you like. Only around 2MB.

When I look at these translogs, I do find some EOF characters (0x1A)
in them at a couple of spots. The files seem to be a semi-binary
format, so maybe that is OK.

The file does seem to be complete and doesn't have a chopped off
record at the end. If I knew for sure what the characters that
separate translog operations, I'd be able to hone in more on the
specific one that is claiming failure.

Thanks,
Paul

On Aug 17, 2:39 pm, Paul ppea...@gmail.com wrote:

Hi,
Wanted to start a new thread to continue this discussion:
http://groups.google.com/a/elasticsearch.com/group/users/browse_threa...

After using the cluster shutdown API, the subsequent start up reported
errors retrieving the transaction log.

The single machine cluster was only a few hours old and I had indexed
~200,000 documents. I used this command to shutdown:
curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'

Here is the log file:gist:531877 · GitHub

I am running master from last night.

There is nothing in the log that indicates anything bad has occurred
on the cluster shutdown.

Any ideas on how to prevent this or at least work around it? The
"corrupted" files get blown away before I have a chance to crack them
open.

Thanks,
Paul

Awesome! I will hammer away on this and see if the issue persists.

Thank you!

On Aug 18, 5:30 am, Shay Banon shay.ba...@elasticsearch.com wrote:

I think I managed to find the problem, and pushed a fix for it:Gateway: Failure to read full translog from the gateway · Issue #328 · elastic/elasticsearch · GitHub. Note, this
menas that upcoming 0.9.1 will require a flush then shutdown operation to
clean the translog when upgrading.

-shay.banon

On Wed, Aug 18, 2010 at 10:27 AM, Paul ppea...@gmail.com wrote:

Comparing the "corrupted" translog to a good one, I don't see any
substantial difference or obvious truncation.

An easy workaround for me is to flush the translogs to the indexes
before cluster shutdown, but need to ensure that I'm not actively
indexing.

Thanks

On Aug 17, 5:04 pm, Paul ppea...@gmail.com wrote:

So, I have managed to reproduce this and before I shutdown, I saved
copied the gateway data to have something to look at for the
"corrupted" translogs. I also have a full DEBUG log capturing this
shutdown and start up.

Unfortunately, I can't publicly post this data, but can make it
available via any private mechanism you like. Only around 2MB.

When I look at these translogs, I do find some EOF characters (0x1A)
in them at a couple of spots. The files seem to be a semi-binary
format, so maybe that is OK.

The file does seem to be complete and doesn't have a chopped off
record at the end. If I knew for sure what the characters that
separate translog operations, I'd be able to hone in more on the
specific one that is claiming failure.

Thanks,
Paul

On Aug 17, 2:39 pm, Paul ppea...@gmail.com wrote:

Hi,
Wanted to start a new thread to continue this discussion:
http://groups.google.com/a/elasticsearch.com/group/users/browse_threa...

After using the cluster shutdown API, the subsequent start up reported
errors retrieving the transaction log.

The single machine cluster was only a few hours old and I had indexed
~200,000 documents. I used this command to shutdown:
curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'

Here is the log file:gist:531877 · GitHub

I am running master from last night.

There is nothing in the log that indicates anything bad has occurred
on the cluster shutdown.

Any ideas on how to prevent this or at least work around it? The
"corrupted" files get blown away before I have a chance to crack them
open.

Thanks,
Paul