The recent ES 1.3.1 announcement around the ning/compression library bug
had me excited because we have had a lingering replica shard inconsistency
issue for a long time (on a very old ES version, which we desperately
desire to upgrade, but have reasons why we can't just yet).
Anyway, we've tried the Disabling Compression on the recovery setting trick
and soaked the change for a few days but continue to see replica issues,
and wanted to report this back.
We regularly 'clean' these by shunting replica shards around to have them
rebuilt, so are confident they were clean for a day, and our check tool
finds small numbers still coming through each day.
I heard from another ES member privately around this same change, and the
1.3.1 does fix the issue for him, but the recovery setting trick doesn't
help either.
I'm not sure if a full cluster start/stop is required for the recovery
channel to switch to non-compressed, reading the release notes it seemed to
read like a runtime setting that could be changed immediately, but it's not
helping here for these types of issues.
I can't yet report if ES 1.3.1 does fix the underlying issue though because
of said upgrade issues (all our side, not an ES problem).
The recent ES 1.3.1 announcement around the ning/compression library bug
had me excited because we have had a lingering replica shard inconsistency
issue for a long time (on a very old ES version, which we desperately
desire to upgrade, but have reasons why we can't just yet).
Anyway, we've tried the Disabling Compression on the recovery setting
trick and soaked the change for a few days but continue to see replica
issues, and wanted to report this back.
We regularly 'clean' these by shunting replica shards around to have them
rebuilt, so are confident they were clean for a day, and our check tool
finds small numbers still coming through each day.
I heard from another ES member privately around this same change, and the
1.3.1 does fix the issue for him, but the recovery setting trick doesn't
help either.
I'm not sure if a full cluster start/stop is required for the recovery
channel to switch to non-compressed, reading the release notes it seemed to
read like a runtime setting that could be changed immediately, but it's not
helping here for these types of issues.
I can't yet report if ES 1.3.1 does fix the underlying issue though
because of said upgrade issues (all our side, not an ES problem).
No haven't looked at trying to manually compare the shard segments. Would
have to manually snapshot the data (constantly under change)
And no I wasn't sure our issue is related to the compression bug but it
sure sounded like it.
What we see is a small volume of changes relating to Deletes that do not
properly delete on the replica. The records are being deleted from the
primary but sometimes the replica does not seem to get this change.
Interesting that the other report by another ES user how's the 1.3.1
release does fix the issue but the setting change doesn't. Since the new
release is limited to mostly this library change seems restricted to this
area.
Only real way is for us to upgrade, which is sadly not straight forward
right now.
Do you observe the replica shard inconsistency only by checksum after
network transport?
In other words, are you sure the inconsistency you observe is caused by a
compression issue in LZF?
Jörg
On Thu, Aug 21, 2014 at 5:52 AM, Paul Smith <tallpsmith@gmail.com
<javascript:_e(%7B%7D,'cvml','tallpsmith@gmail.com');>> wrote:
Hi all,
The recent ES 1.3.1 announcement around the ning/compression library bug
had me excited because we have had a lingering replica shard inconsistency
issue for a long time (on a very old ES version, which we desperately
desire to upgrade, but have reasons why we can't just yet).
Anyway, we've tried the Disabling Compression on the recovery setting
trick and soaked the change for a few days but continue to see replica
issues, and wanted to report this back.
We regularly 'clean' these by shunting replica shards around to have them
rebuilt, so are confident they were clean for a day, and our check tool
finds small numbers still coming through each day.
I heard from another ES member privately around this same change, and the
1.3.1 does fix the issue for him, but the recovery setting trick doesn't
help either.
I'm not sure if a full cluster start/stop is required for the recovery
channel to switch to non-compressed, reading the release notes it seemed to
read like a runtime setting that could be changed immediately, but it's not
helping here for these types of issues.
I can't yet report if ES 1.3.1 does fix the underlying issue though
because of said upgrade issues (all our side, not an ES problem).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.