Replica Shard inconsistencies & disabling compression don't appear to help

Paul_Smith · August 21, 2014, 3:52am

Hi all,

The recent ES 1.3.1 announcement around the ning/compression library bug
had me excited because we have had a lingering replica shard inconsistency
issue for a long time (on a very old ES version, which we desperately
desire to upgrade, but have reasons why we can't just yet).

Anyway, we've tried the Disabling Compression on the recovery setting trick
and soaked the change for a few days but continue to see replica issues,
and wanted to report this back.

We regularly 'clean' these by shunting replica shards around to have them
rebuilt, so are confident they were clean for a day, and our check tool
finds small numbers still coming through each day.

I heard from another ES member privately around this same change, and the
1.3.1 does fix the issue for him, but the recovery setting trick doesn't
help either.

I'm not sure if a full cluster start/stop is required for the recovery
channel to switch to non-compressed, reading the release notes it seemed to
read like a runtime setting that could be changed immediately, but it's not
helping here for these types of issues.

I can't yet report if ES 1.3.1 does fix the underlying issue though because
of said upgrade issues (all our side, not an ES problem).

anyway, thought I would pass that data point on.

regards,

Paul Smith

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHfYWB6FpH-i7kDs9m_v4HgwPK-DWK75mATO_7DZNQjpDh6B%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

jprante · August 21, 2014, 8:25am

Do you observe the replica shard inconsistency only by checksum after
network transport?

In other words, are you sure the inconsistency you observe is caused by a
compression issue in LZF?

Jörg

On Thu, Aug 21, 2014 at 5:52 AM, Paul Smith tallpsmith@gmail.com wrote:

Hi all,

The recent ES 1.3.1 announcement around the ning/compression library bug
had me excited because we have had a lingering replica shard inconsistency
issue for a long time (on a very old ES version, which we desperately
desire to upgrade, but have reasons why we can't just yet).

Anyway, we've tried the Disabling Compression on the recovery setting
trick and soaked the change for a few days but continue to see replica
issues, and wanted to report this back.

We regularly 'clean' these by shunting replica shards around to have them
rebuilt, so are confident they were clean for a day, and our check tool
finds small numbers still coming through each day.

I heard from another ES member privately around this same change, and the
1.3.1 does fix the issue for him, but the recovery setting trick doesn't
help either.

I'm not sure if a full cluster start/stop is required for the recovery
channel to switch to non-compressed, reading the release notes it seemed to
read like a runtime setting that could be changed immediately, but it's not
helping here for these types of issues.

I can't yet report if ES 1.3.1 does fix the underlying issue though
because of said upgrade issues (all our side, not an ES problem).

anyway, thought I would pass that data point on.

regards,

Paul Smith

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB6FpH-i7kDs9m_v4HgwPK-DWK75mATO_7DZNQjpDh6B%3Dw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB6FpH-i7kDs9m_v4HgwPK-DWK75mATO_7DZNQjpDh6B%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEMDArRwjXw7Qv6jfD4Sa%3DiXGu-Aez6240kg3wS%3DyGT_Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Paul_Smith · August 21, 2014, 9:21am

No haven't looked at trying to manually compare the shard segments. Would
have to manually snapshot the data (constantly under change)

And no I wasn't sure our issue is related to the compression bug but it
sure sounded like it.

What we see is a small volume of changes relating to Deletes that do not
properly delete on the replica. The records are being deleted from the
primary but sometimes the replica does not seem to get this change.

Interesting that the other report by another ES user how's the 1.3.1
release does fix the issue but the setting change doesn't. Since the new
release is limited to mostly this library change seems restricted to this
area.

Only real way is for us to upgrade, which is sadly not straight forward
right now.

Paul
On Thursday, 21 August 2014, joergprante@gmail.com joergprante@gmail.com
wrote:

Do you observe the replica shard inconsistency only by checksum after
network transport?

In other words, are you sure the inconsistency you observe is caused by a
compression issue in LZF?

Jörg

On Thu, Aug 21, 2014 at 5:52 AM, Paul Smith <tallpsmith@gmail.com
<javascript:_e(%7B%7D,'cvml','tallpsmith@gmail.com');>> wrote:

Hi all,

The recent ES 1.3.1 announcement around the ning/compression library bug
had me excited because we have had a lingering replica shard inconsistency
issue for a long time (on a very old ES version, which we desperately
desire to upgrade, but have reasons why we can't just yet).

Anyway, we've tried the Disabling Compression on the recovery setting
trick and soaked the change for a few days but continue to see replica
issues, and wanted to report this back.

We regularly 'clean' these by shunting replica shards around to have them
rebuilt, so are confident they were clean for a day, and our check tool
finds small numbers still coming through each day.

I heard from another ES member privately around this same change, and the
1.3.1 does fix the issue for him, but the recovery setting trick doesn't
help either.

I'm not sure if a full cluster start/stop is required for the recovery
channel to switch to non-compressed, reading the release notes it seemed to
read like a runtime setting that could be changed immediately, but it's not
helping here for these types of issues.

I can't yet report if ES 1.3.1 does fix the underlying issue though
because of said upgrade issues (all our side, not an ES problem).

anyway, thought I would pass that data point on.

regards,

Paul Smith

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com
<javascript:_e(%7B%7D,'cvml','elasticsearch%2Bunsubscribe@googlegroups.com');>
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB6FpH-i7kDs9m_v4HgwPK-DWK75mATO_7DZNQjpDh6B%3Dw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB6FpH-i7kDs9m_v4HgwPK-DWK75mATO_7DZNQjpDh6B%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com
<javascript:_e(%7B%7D,'cvml','elasticsearch%2Bunsubscribe@googlegroups.com');>
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEMDArRwjXw7Qv6jfD4Sa%3DiXGu-Aez6240kg3wS%3DyGT_Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEMDArRwjXw7Qv6jfD4Sa%3DiXGu-Aez6240kg3wS%3DyGT_Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHfYWB7jQQGHn8oTN0EK80hjDQ1PjAWRDy6JWZJ_rSdBGVsvrQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Shard data inconsistencies Elasticsearch	2	925	July 6, 2017
Replicas out of sync Elasticsearch	19	4777	February 28, 2018
Forcing sync of replicas Elasticsearch	5	2631	July 6, 2017
Primary vs. replica shard inconsistencies? Elasticsearch	8	1054	July 6, 2017
Constant Recovering and Unassigned shards for an index Elasticsearch	12	968	July 6, 2017

Replica Shard inconsistencies & disabling compression don't appear to help

Related topics