Failed to retrieve transaction log - take 2

ppearcy · August 17, 2010, 8:39pm

Hi,
Wanted to start a new thread to continue this discussion:
http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/381b33ab3f4fff74/2f95bacbfc864415?lnk=gst&q=eofexception#2f95bacbfc864415

After using the cluster shutdown API, the subsequent start up reported
errors retrieving the transaction log.

The single machine cluster was only a few hours old and I had indexed
~200,000 documents. I used this command to shutdown:
curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'

Here is the log file:

gist.github.com

https://gist.github.com/ppearcy/531877

gistfile1.txt

[10:09:33,657][INFO ][node                     ] [Ghost Rider] {elasticsearch/0.9.1-SNAPSHOT/2010-08-17T06:58:53}[5613]: initializing ...
[10:09:33,657][INFO ][plugins                  ] [Ghost Rider] loaded []
[10:09:35,113][INFO ][node                     ] [Ghost Rider] {elasticsearch/0.9.1-SNAPSHOT/2010-08-17T06:58:53}[5613]: initialized
[10:09:35,113][INFO ][node                     ] [Ghost Rider] {elasticsearch/0.9.1-SNAPSHOT/2010-08-17T06:58:53}[5613]: starting ...
[10:09:35,217][INFO ][transport                ] [Ghost Rider] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.2.20.160:9300]}
[10:09:38,234][INFO ][cluster.service          ] [Ghost Rider] new_master [Ghost Rider][8904a35d-b4ca-474f-a52d-11563b817edd][inet[/10.2.20.160:9300]], reason: zen-disco-join (elected_as_master)
[10:09:38,234][INFO ][discovery                ] [Ghost Rider] elasticsearch/8904a35d-b4ca-474f-a52d-11563b817edd
[10:09:38,286][INFO ][http                     ] [Ghost Rider] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.2.20.160:9200]}
[10:09:38,598][INFO ][jmx                      ] [Ghost Rider] bound_address {service:jmx:rmi:///jndi/rmi://:9400/jmxrmi}, publish_address {service:jmx:rmi:///jndi/rmi://10.2.20.160:9400/jmxrmi}
[10:09:38,598][INFO ][node                     ] [Ghost Rider] {elasticsearch/0.9.1-SNAPSHOT/2010-08-17T06:58:53}[5613]: started

This file has been truncated. show original

I am running master from last night.

There is nothing in the log that indicates anything bad has occurred
on the cluster shutdown.

Any ideas on how to prevent this or at least work around it? The
"corrupted" files get blown away before I have a chance to crack them
open.

Thanks,
Paul

ppearcy · August 17, 2010, 11:04pm

So, I have managed to reproduce this and before I shutdown, I saved
copied the gateway data to have something to look at for the
"corrupted" translogs. I also have a full DEBUG log capturing this
shutdown and start up.

Unfortunately, I can't publicly post this data, but can make it
available via any private mechanism you like. Only around 2MB.

When I look at these translogs, I do find some EOF characters (0x1A)
in them at a couple of spots. The files seem to be a semi-binary
format, so maybe that is OK.

The file does seem to be complete and doesn't have a chopped off
record at the end. If I knew for sure what the characters that
separate translog operations, I'd be able to hone in more on the
specific one that is claiming failure.

Thanks,
Paul

On Aug 17, 2:39 pm, Paul ppea...@gmail.com wrote:

Hi,
Wanted to start a new thread to continue this discussion:http://groups.google.com/a/elasticsearch.com/group/users/browse_threa...

After using the cluster shutdown API, the subsequent start up reported
errors retrieving the transaction log.

The single machine cluster was only a few hours old and I had indexed
~200,000 documents. I used this command to shutdown:
curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'

Here is the log file:gist:531877 · GitHub

I am running master from last night.

There is nothing in the log that indicates anything bad has occurred
on the cluster shutdown.

Any ideas on how to prevent this or at least work around it? The
"corrupted" files get blown away before I have a chance to crack them
open.

Thanks,
Paul

ppearcy · August 18, 2010, 7:27am

Comparing the "corrupted" translog to a good one, I don't see any
substantial difference or obvious truncation.

An easy workaround for me is to flush the translogs to the indexes
before cluster shutdown, but need to ensure that I'm not actively
indexing.

Thanks

On Aug 17, 5:04 pm, Paul ppea...@gmail.com wrote:

So, I have managed to reproduce this and before I shutdown, I saved
copied the gateway data to have something to look at for the
"corrupted" translogs. I also have a full DEBUG log capturing this
shutdown and start up.

Unfortunately, I can't publicly post this data, but can make it
available via any private mechanism you like. Only around 2MB.

When I look at these translogs, I do find some EOF characters (0x1A)
in them at a couple of spots. The files seem to be a semi-binary
format, so maybe that is OK.

The file does seem to be complete and doesn't have a chopped off
record at the end. If I knew for sure what the characters that
separate translog operations, I'd be able to hone in more on the
specific one that is claiming failure.

Thanks,
Paul

On Aug 17, 2:39 pm, Paul ppea...@gmail.com wrote:

Hi,
Wanted to start a new thread to continue this discussion:http://groups.google.com/a/elasticsearch.com/group/users/browse_threa...

After using the cluster shutdown API, the subsequent start up reported
errors retrieving the transaction log.

The single machine cluster was only a few hours old and I had indexed
~200,000 documents. I used this command to shutdown:
curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'

Here is the log file:gist:531877 · GitHub

I am running master from last night.

There is nothing in the log that indicates anything bad has occurred
on the cluster shutdown.

Any ideas on how to prevent this or at least work around it? The
"corrupted" files get blown away before I have a chance to crack them
open.

Thanks,
Paul

kimchy · August 18, 2010, 11:30am

I think I managed to find the problem, and pushed a fix for it:
Gateway: Failure to read full translog from the gateway · Issue #328 · elastic/elasticsearch · GitHub. Note, this
menas that upcoming 0.9.1 will require a flush then shutdown operation to
clean the translog when upgrading.

-shay.banon

On Wed, Aug 18, 2010 at 10:27 AM, Paul ppearcy@gmail.com wrote:

Comparing the "corrupted" translog to a good one, I don't see any
substantial difference or obvious truncation.

An easy workaround for me is to flush the translogs to the indexes
before cluster shutdown, but need to ensure that I'm not actively
indexing.

Thanks

On Aug 17, 5:04 pm, Paul ppea...@gmail.com wrote:

So, I have managed to reproduce this and before I shutdown, I saved
copied the gateway data to have something to look at for the
"corrupted" translogs. I also have a full DEBUG log capturing this
shutdown and start up.

Unfortunately, I can't publicly post this data, but can make it
available via any private mechanism you like. Only around 2MB.

When I look at these translogs, I do find some EOF characters (0x1A)
in them at a couple of spots. The files seem to be a semi-binary
format, so maybe that is OK.

The file does seem to be complete and doesn't have a chopped off
record at the end. If I knew for sure what the characters that
separate translog operations, I'd be able to hone in more on the
specific one that is claiming failure.

Thanks,
Paul

On Aug 17, 2:39 pm, Paul ppea...@gmail.com wrote:

Hi,
Wanted to start a new thread to continue this discussion:
http://groups.google.com/a/elasticsearch.com/group/users/browse_threa...

After using the cluster shutdown API, the subsequent start up reported
errors retrieving the transaction log.

The single machine cluster was only a few hours old and I had indexed
~200,000 documents. I used this command to shutdown:
curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'

Here is the log file:gist:531877 · GitHub

I am running master from last night.

There is nothing in the log that indicates anything bad has occurred
on the cluster shutdown.

Any ideas on how to prevent this or at least work around it? The
"corrupted" files get blown away before I have a chance to crack them
open.

Thanks,
Paul

ppearcy · August 18, 2010, 3:32pm

Awesome! I will hammer away on this and see if the issue persists.

Thank you!

On Aug 18, 5:30 am, Shay Banon shay.ba...@elasticsearch.com wrote:

I think I managed to find the problem, and pushed a fix for it:Gateway: Failure to read full translog from the gateway · Issue #328 · elastic/elasticsearch · GitHub. Note, this
menas that upcoming 0.9.1 will require a flush then shutdown operation to
clean the translog when upgrading.

-shay.banon

On Wed, Aug 18, 2010 at 10:27 AM, Paul ppea...@gmail.com wrote:

Comparing the "corrupted" translog to a good one, I don't see any
substantial difference or obvious truncation.

An easy workaround for me is to flush the translogs to the indexes
before cluster shutdown, but need to ensure that I'm not actively
indexing.

Thanks

On Aug 17, 5:04 pm, Paul ppea...@gmail.com wrote:

So, I have managed to reproduce this and before I shutdown, I saved
copied the gateway data to have something to look at for the
"corrupted" translogs. I also have a full DEBUG log capturing this
shutdown and start up.

Unfortunately, I can't publicly post this data, but can make it
available via any private mechanism you like. Only around 2MB.

When I look at these translogs, I do find some EOF characters (0x1A)
in them at a couple of spots. The files seem to be a semi-binary
format, so maybe that is OK.

The file does seem to be complete and doesn't have a chopped off
record at the end. If I knew for sure what the characters that
separate translog operations, I'd be able to hone in more on the
specific one that is claiming failure.

Thanks,
Paul

On Aug 17, 2:39 pm, Paul ppea...@gmail.com wrote:

Hi,
Wanted to start a new thread to continue this discussion:
http://groups.google.com/a/elasticsearch.com/group/users/browse_threa...

After using the cluster shutdown API, the subsequent start up reported
errors retrieving the transaction log.

The single machine cluster was only a few hours old and I had indexed
~200,000 documents. I used this command to shutdown:
curl -XPOST 'http://localhost:9200/_cluster/nodes/_shutdown'

Here is the log file:gist:531877 · GitHub

I am running master from last night.

There is nothing in the log that indicates anything bad has occurred
on the cluster shutdown.

Any ideas on how to prevent this or at least work around it? The
"corrupted" files get blown away before I have a chance to crack them
open.

Thanks,
Paul

Topic		Replies	Views
Failed to retieve translog exception Elasticsearch	14	778	July 6, 2017
Corrupted translog Elasticsearch	18	8374	June 27, 2017
Failed to recover from translog Elasticsearch	3	2070	July 5, 2017
TranslogCorruptedException after restarting ES Elasticsearch	2	2907	July 5, 2017
Translog is corrupted Elasticsearch	3	3510	November 1, 2021

Failed to retrieve transaction log - take 2

Related topics