Inconsistent data among nodes for one (or more) indexes (ES 0.11)

Pablo_Borges · October 8, 2010, 7:06pm

Here's one of the problems I got frequently.

I have a 5 node ES (using local storage) cluster running 8 indexes each with
thousands of documents (one of them with millions) and every now and then it
seems like replication breaks:

for node in n1 n2 n3 n4 n5; do
curl "http://${node}:9200/index/_count?q=*"

{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}

The only solution found so far is to recreate the index and reindex.

Any help will be really appreciatted!

kimchy · October 8, 2010, 7:17pm

Have you seen the problem with recovery and local gateway with more than one
index I posted? Also, do you see any exceptions in the log? The index seems
to have just one shard, how many replicas does it have?

-shay.banon

On Fri, Oct 8, 2010 at 9:06 PM, Pablo Borges pablort@gmail.com wrote:

Here's one of the problems I got frequently.

I have a 5 node ES (using local storage) cluster running 8 indexes each
with thousands of documents (one of them with millions) and every now and
then it seems like replication breaks:

for node in n1 n2 n3 n4 n5; do
curl "http://${node}:9200/index/_count?q=*"

{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}

The only solution found so far is to recreate the index and reindex.

Any help will be really appreciatted!

kimchy · October 8, 2010, 7:23pm

Also, a few more points on the count:

Going to different nodes is irrelevant. The count will round robin
between the shards and be directed to the appropriate node(s).
Do you index while you execute the count? Have you changed the refresh
interval?

-shay.banon

On Fri, Oct 8, 2010 at 9:17 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Have you seen the problem with recovery and local gateway with more than
one index I posted? Also, do you see any exceptions in the log? The index
seems to have just one shard, how many replicas does it have?

-shay.banon

On Fri, Oct 8, 2010 at 9:06 PM, Pablo Borges pablort@gmail.com wrote:

Here's one of the problems I got frequently.

I have a 5 node ES (using local storage) cluster running 8 indexes each
with thousands of documents (one of them with millions) and every now and
then it seems like replication breaks:

for node in n1 n2 n3 n4 n5; do
curl "http://${node}:9200/index/_count?q=*"

{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}

The only solution found so far is to recreate the index and reindex.

Any help will be really appreciatted!

Pablo_Borges · October 8, 2010, 8:04pm

About 1. If you notice, I use only one shard per node, which means one of
them (al least) is bogged.
About 2. No. The data has been indexed sometime ago and updates are not
frequent.

On Fri, Oct 8, 2010 at 4:23 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Also, a few more points on the count:

Going to different nodes is irrelevant. The count will round robin
between the shards and be directed to the appropriate node(s).

Do you index while you execute the count? Have you changed the refresh
interval?

-shay.banon

On Fri, Oct 8, 2010 at 9:17 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Have you seen the problem with recovery and local gateway with more than
one index I posted? Also, do you see any exceptions in the log? The index
seems to have just one shard, how many replicas does it have?

-shay.banon

On Fri, Oct 8, 2010 at 9:06 PM, Pablo Borges pablort@gmail.com wrote:

Here's one of the problems I got frequently.

I have a 5 node ES (using local storage) cluster running 8 indexes each
with thousands of documents (one of them with millions) and every now and
then it seems like replication breaks:

for node in n1 n2 n3 n4 n5; do
curl "http://${node}:9200/index/_count?q=*"

{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}

The only solution found so far is to recreate the index and reindex.

Any help will be really appreciatted!

kimchy · October 8, 2010, 8:05pm

But do they still happen?

On Fri, Oct 8, 2010 at 10:04 PM, Pablo Borges pablort@gmail.com wrote:

About 1. If you notice, I use only one shard per node, which means one of
them (al least) is bogged.
About 2. No. The data has been indexed sometime ago and updates are not
frequent.

On Fri, Oct 8, 2010 at 4:23 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Also, a few more points on the count:

Going to different nodes is irrelevant. The count will round robin
between the shards and be directed to the appropriate node(s).

Do you index while you execute the count? Have you changed the refresh
interval?

-shay.banon

On Fri, Oct 8, 2010 at 9:17 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Have you seen the problem with recovery and local gateway with more than
one index I posted? Also, do you see any exceptions in the log? The index
seems to have just one shard, how many replicas does it have?

-shay.banon

On Fri, Oct 8, 2010 at 9:06 PM, Pablo Borges pablort@gmail.com wrote:

Here's one of the problems I got frequently.

I have a 5 node ES (using local storage) cluster running 8 indexes each
with thousands of documents (one of them with millions) and every now and
then it seems like replication breaks:

for node in n1 n2 n3 n4 n5; do
curl "http://${node}:9200/index/_count?q=*"

{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}

The only solution found so far is to recreate the index and reindex.

Any help will be really appreciatted!

Pablo_Borges · October 8, 2010, 8:06pm

No exceptions, but I needed to reduce my log level in order not to fill the
partition the logs are written (several search exceptions, which I'd like to
know if there's a way to disable)

Here's my logging.yml

rootLogger: INFO, console, file
logger:

log action execution errors for easier debugging

action : INFO

appender:
console:
type: console
layout:
type: consolePattern
conversionPattern: "[%d{ABSOLUTE}][%-5p][%-25c] %m%n"

file:
type: dailyRollingFile
file: /var/log/elasticsearch/${cluster.name}.log
datePattern: "'.'yyyy-MM-dd"
layout:
type: pattern
conversionPattern: "[%d{ABSOLUTE}][%-5p][%-25c] %m%n"

On Fri, Oct 8, 2010 at 4:17 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Have you seen the problem with recovery and local gateway with more than
one index I posted? Also, do you see any exceptions in the log? The index
seems to have just one shard, how many replicas does it have?

-shay.banon

On Fri, Oct 8, 2010 at 9:06 PM, Pablo Borges pablort@gmail.com wrote:

Here's one of the problems I got frequently.

I have a 5 node ES (using local storage) cluster running 8 indexes each
with thousands of documents (one of them with millions) and every now and
then it seems like replication breaks:

for node in n1 n2 n3 n4 n5; do
curl "http://${node}:9200/index/_count?q=*"

{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}

The only solution found so far is to recreate the index and reindex.

Any help will be really appreciatted!

Pablo_Borges · October 8, 2010, 8:07pm

Yes:

{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}

On Fri, Oct 8, 2010 at 5:05 PM, Shay Banon shay.banon@elasticsearch.comwrote:

But do they still happen?

On Fri, Oct 8, 2010 at 10:04 PM, Pablo Borges pablort@gmail.com wrote:

About 1. If you notice, I use only one shard per node, which means one of
them (al least) is bogged.
About 2. No. The data has been indexed sometime ago and updates are not
frequent.

On Fri, Oct 8, 2010 at 4:23 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Also, a few more points on the count:

Going to different nodes is irrelevant. The count will round robin
between the shards and be directed to the appropriate node(s).

Do you index while you execute the count? Have you changed the refresh
interval?

-shay.banon

On Fri, Oct 8, 2010 at 9:17 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Have you seen the problem with recovery and local gateway with more than
one index I posted? Also, do you see any exceptions in the log? The index
seems to have just one shard, how many replicas does it have?

-shay.banon

On Fri, Oct 8, 2010 at 9:06 PM, Pablo Borges pablort@gmail.com wrote:

Here's one of the problems I got frequently.

I have a 5 node ES (using local storage) cluster running 8 indexes each
with thousands of documents (one of them with millions) and every now and
then it seems like replication breaks:

for node in n1 n2 n3 n4 n5; do
curl "http://${node}:9200/index/_count?q=*"

{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}

The only solution found so far is to recreate the index and reindex.

Any help will be really appreciatted!

kimchy · October 8, 2010, 8:13pm

I suggest you keep action on DEBUG, I can help with the search exceptions as
well.

On Fri, Oct 8, 2010 at 10:06 PM, Pablo Borges pablort@gmail.com wrote:

No exceptions, but I needed to reduce my log level in order not to fill the
partition the logs are written (several search exceptions, which I'd like to
know if there's a way to disable)

Here's my logging.yml

rootLogger: INFO, console, file
logger:

log action execution errors for easier debugging

action : INFO

appender:
console:
type: console
layout:
type: consolePattern
conversionPattern: "[%d{ABSOLUTE}][%-5p][%-25c] %m%n"

file:
type: dailyRollingFile
file: /var/log/elasticsearch/${cluster.name}.log
datePattern: "'.'yyyy-MM-dd"
layout:
type: pattern
conversionPattern: "[%d{ABSOLUTE}][%-5p][%-25c] %m%n"

On Fri, Oct 8, 2010 at 4:17 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Have you seen the problem with recovery and local gateway with more than
one index I posted? Also, do you see any exceptions in the log? The index
seems to have just one shard, how many replicas does it have?

-shay.banon

On Fri, Oct 8, 2010 at 9:06 PM, Pablo Borges pablort@gmail.com wrote:

Here's one of the problems I got frequently.

I have a 5 node ES (using local storage) cluster running 8 indexes each
with thousands of documents (one of them with millions) and every now and
then it seems like replication breaks:

for node in n1 n2 n3 n4 n5; do
curl "http://${node}:9200/index/_count?q=*"

{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}

The only solution found so far is to recreate the index and reindex.

Any help will be really appreciatted!

kimchy · October 8, 2010, 8:15pm

IF they still happen, you might get into the near real time aspect of
elasticsearch ... . The question is if they are not the same once there are
no indexing going on. Also, note that there is a problem with multiple
indices and the local gateway.

-shay.banon

On Fri, Oct 8, 2010 at 10:07 PM, Pablo Borges pablort@gmail.com wrote:

Yes:

{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}

On Fri, Oct 8, 2010 at 5:05 PM, Shay Banon shay.banon@elasticsearch.comwrote:

But do they still happen?

On Fri, Oct 8, 2010 at 10:04 PM, Pablo Borges pablort@gmail.com wrote:

About 1. If you notice, I use only one shard per node, which means one of
them (al least) is bogged.
About 2. No. The data has been indexed sometime ago and updates are not
frequent.

On Fri, Oct 8, 2010 at 4:23 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Also, a few more points on the count:

Going to different nodes is irrelevant. The count will round robin
between the shards and be directed to the appropriate node(s).

Do you index while you execute the count? Have you changed the
refresh interval?

-shay.banon

On Fri, Oct 8, 2010 at 9:17 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Have you seen the problem with recovery and local gateway with more
than one index I posted? Also, do you see any exceptions in the log? The
index seems to have just one shard, how many replicas does it have?

-shay.banon

On Fri, Oct 8, 2010 at 9:06 PM, Pablo Borges pablort@gmail.comwrote:

Here's one of the problems I got frequently.

I have a 5 node ES (using local storage) cluster running 8 indexes
each with thousands of documents (one of them with millions) and every now
and then it seems like replication breaks:

for node in n1 n2 n3 n4 n5; do
curl "http://${node}:9200/index/_count?q=*"

{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":22408,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}
{"count":20230,"_shards":{"total":1,"successful":1,"failed":0}}

The only solution found so far is to recreate the index and reindex.

Any help will be really appreciatted!

Topic		Replies	Views
Elasticsearch Index shards per nodes Elasticsearch	13	1334	October 5, 2020
Inconsistent results (again) Elasticsearch	6	288	July 6, 2017
Data for the index is stored in only one shard Elasticsearch reindex	1	37	August 8, 2024
Nodes Out of Sync Elasticsearch	7	3537	January 5, 2018
Missing results from search Elasticsearch	7	3274	December 14, 2017

Inconsistent data among nodes for one (or more) indexes (ES 0.11)

log action execution errors for easier debugging

log action execution errors for easier debugging

Related topics