Problems with 0.18.X


(Grant) #1

Today we attempted an upgrade from 0.17.6 to 0.18.4. We run a 4 node
cluster with lots of small indexes (roughly 0-800mb each) each with 2
replicas, no sharding. We had about 3gb of data on each node at the
time.

After a full restart, the cluster failed to recover and remained in a
red state until nodes would start to become unresponsive. This
happened as soon as the processes resident size equaled its JVM heap
size (2gb initially). Raising the heap to 8gb had no effect. ES would
just consume all the available heap then become unresponsive after
having done so.

We eventually decided to re-index (starting with an empty data
directory on all nodes), still on 0.18.4, and discovered that we could
create about 10 (empty) indexes in rapid succession and then ES would
become unresponsive. It was also very slow to move out of red or
yellow states.

Having lost most hope by this point, we downgraded to 0.17.6 and began
indexing again, and everything flew along in typical fashion.

Anyone have any thoughts as to what gives?

The only useful log messages I turned up was when we were trying to re-
index from scratch on 0.18.4... these streamed continuously:

[2011-11-21 21:46:19,421][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,485][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-3-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,544][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-3-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,605][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,670][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-3-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,730][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,792][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,857][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,916][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,975][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,043][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,102][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,163][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,224][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,292][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-3-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,351][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-3-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,411][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,474][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,534][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,594][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]


(Matt) #2

I was having the same problem, which Shay replied:

https://groups.google.com/d/topic/elasticsearch/cYmNcpV6g98/discussion


(Shay Banon) #3

Hey,

You should not see the "re-sync" message when indexing into a fresh
0.18.4 cluster, are you sure you saw it then and not when you upgraded from
0.17 to 0.18? You might see this message when upgrading to 0.18 from 0.17.

You said that you created indices and eventually it got stuck on .18
cluster, can you gist the settings / mappings for those indices that you
created. I would like to try and simulate it here.

-shay.banon

On Tue, Nov 22, 2011 at 12:07 AM, Grant grant@brewster.com wrote:

Today we attempted an upgrade from 0.17.6 to 0.18.4. We run a 4 node
cluster with lots of small indexes (roughly 0-800mb each) each with 2
replicas, no sharding. We had about 3gb of data on each node at the
time.

After a full restart, the cluster failed to recover and remained in a
red state until nodes would start to become unresponsive. This
happened as soon as the processes resident size equaled its JVM heap
size (2gb initially). Raising the heap to 8gb had no effect. ES would
just consume all the available heap then become unresponsive after
having done so.

We eventually decided to re-index (starting with an empty data
directory on all nodes), still on 0.18.4, and discovered that we could
create about 10 (empty) indexes in rapid succession and then ES would
become unresponsive. It was also very slow to move out of red or
yellow states.

Having lost most hope by this point, we downgraded to 0.17.6 and began
indexing again, and everything flew along in typical fashion.

Anyone have any thoughts as to what gives?

The only useful log messages I turned up was when we were trying to re-
index from scratch on 0.18.4... these streamed continuously:

[2011-11-21 21:46:19,421][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,485][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-3-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,544][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-3-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,605][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,670][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-3-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,730][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,792][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,857][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,916][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:19,975][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,043][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,102][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,163][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,224][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,292][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-3-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,351][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-3-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,411][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-7-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,474][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,534][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]
[2011-11-21 21:46:20,594][WARN ][cluster.metadata ] [prod-
elasticsearch-r02] [contact_documents-11-0] re-syncing mappings with
cluster state for types [[contact_document]]


(Pierre Baillet-2) #4

Hi,

On Nov 22, 8:56 am, Shay Banon kim...@gmail.com wrote:

Hey,

You should not see the "re-sync" message when indexing into a fresh
0.18.4 cluster, are you sure you saw it then and not when you upgraded from
0.17 to 0.18? You might see this message when upgrading to 0.18 from 0.17.

Having the same problem even after our indices were recreated from
scratch, I debugged the code in the
MetaDataMappingService#refreshMapping on version 0.18.2
(864d95d1543fb712e2b1b3a11be5a5c935b93624) and found out that:

  • at line 127, mapper.mappingSource() and
    indexMetaData.mappings().get(type).source()) differ in my case.
  • once split by line at }, the diff of the two sources is:

--- a 2011-11-28 14:32:54.000000000 +0100
+++ b 2011-11-28 14:33:05.000000000 +0100
@@ -16,8 +16,8 @@
}
,"location":{"type":"string","store":"yes","include_in_all":true}
,"name":{"type":"multi_field","fields":{"name":
{"type":"string","boost":
2.1,"store":"yes","omit_term_freq_and_positions":true,"include_in_all":true}
-,"fuzzy":
{"type":"string","analyzer":"fuzzy","omit_term_freq_and_positions":true,"include_in_all":false}
,"alphaSort":
{"type":"string","analyzer":"alphasort","omit_term_freq_and_positions":true,"include_in_all":false}
+,"fuzzy":
{"type":"string","analyzer":"fuzzy","omit_term_freq_and_positions":true,"include_in_all":false}
,"autocomplete":
{"type":"string","index_analyzer":"ngram","search_analyzer":"auto_complete_search_analyser","include_in_all":false}
}
}

As you can see, it's a simple reorder of the mapping that causes the
difference and endless synchronize the mappings.

The gist of the two original mappings is available at https://gist.github.com/1400435
.

This issue eats 100% of the cpu of our master which make ugly graphs
in our monitoring tool and made me worry a bit when I discovered the
issue at first.

Cheers,
Pierre.


(Shay Banon) #5

Hey,

Yea, another user has hit that, with multi field mapping and multiple
additional fields mapped:
https://github.com/elasticsearch/elasticsearch/issues/1487. It will be part
of 0.18.5.

-shay.banon

On Mon, Nov 28, 2011 at 3:48 PM, Pierre Baillet octplane@gmail.com wrote:

Hi,

On Nov 22, 8:56 am, Shay Banon kim...@gmail.com wrote:

Hey,

You should not see the "re-sync" message when indexing into a fresh
0.18.4 cluster, are you sure you saw it then and not when you upgraded
from
0.17 to 0.18? You might see this message when upgrading to 0.18 from
0.17.

Having the same problem even after our indices were recreated from
scratch, I debugged the code in the
MetaDataMappingService#refreshMapping on version 0.18.2
(864d95d1543fb712e2b1b3a11be5a5c935b93624) and found out that:

  • at line 127, mapper.mappingSource() and
    indexMetaData.mappings().get(type).source()) differ in my case.
  • once split by line at }, the diff of the two sources is:

--- a 2011-11-28 14:32:54.000000000 +0100
+++ b 2011-11-28 14:33:05.000000000 +0100
@@ -16,8 +16,8 @@
}
,"location":{"type":"string","store":"yes","include_in_all":true}
,"name":{"type":"multi_field","fields":{"name":
{"type":"string","boost":

2.1,"store":"yes","omit_term_freq_and_positions":true,"include_in_all":true}
-,"fuzzy":

{"type":"string","analyzer":"fuzzy","omit_term_freq_and_positions":true,"include_in_all":false}
,"alphaSort":

{"type":"string","analyzer":"alphasort","omit_term_freq_and_positions":true,"include_in_all":false}
+,"fuzzy":

{"type":"string","analyzer":"fuzzy","omit_term_freq_and_positions":true,"include_in_all":false}
,"autocomplete":

{"type":"string","index_analyzer":"ngram","search_analyzer":"auto_complete_search_analyser","include_in_all":false}
}
}

As you can see, it's a simple reorder of the mapping that causes the
difference and endless synchronize the mappings.

The gist of the two original mappings is available at
https://gist.github.com/1400435
.

This issue eats 100% of the cpu of our master which make ugly graphs
in our monitoring tool and made me worry a bit when I discovered the
issue at first.

Cheers,
Pierre.


(system) #6