ES fails to update dynamic mapping; mapping and index creation times out afterwards


(Chris Hairfield) #1

Hello,

Has anyone encountered a "hung" ElasticSearch instance where mapping
creation/updates and index creations timed out? Here are the details...

ElasticSearch 1.0.0.RC1, default configuration (we've seen it on 0.90.5 as
well)
Windows Server 2008/12 and 7/8
Java SE Runtime Environment (1.7.0-b147)
HotSpot 64-Bit Server VM (build 21.0-b17, mixed mode)

We encounter this when inserting via the bulk api. When we have the first
batch of up to 100 documents of a particular type to index, we attempt to
ensure that it has the proper mapping. That mapping will look like this:
{

"com_latitudeqa_qaservices3:http_arcgis_rest_services_qa_sde_mapserver_16":
{
"type": "object",
"_all": {
"enabled": true
},
"dynamic": true,
"_id": {
"path": "attributes.OBJECTID"
},
"properties": {
"location": {
"type": "geo_shape",
"tree": "quadtree",
"precision": "50m",
"validate": "true",
"lat_lon": "true"
}
}
}
}

When it hangs, it always hangs before updating a mapping it just
successfully created. Here is a snippet of the log file at INFO:
... some successful mapping creations and updates ...
[2014-01-17 17:02:07,263][INFO ][cluster.metadata ] [CHAIRFIELD2-PC]
[onprem] create_mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_13]
[2014-01-17 17:02:08,918][INFO ][cluster.metadata ] [CHAIRFIELD2-PC]
[onprem] update_mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_13] (dynamic)
[2014-01-17 17:02:09,156][INFO ][cluster.metadata ] [CHAIRFIELD2-PC]
[onprem] create_mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_15]
[2014-01-17 17:02:09,401][INFO ][cluster.metadata ] [CHAIRFIELD2-PC]
[onprem] update_mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_15] (dynamic)
[2014-01-17 17:02:09,551][INFO ][cluster.metadata ] [CHAIRFIELD2-PC]
[onprem] create_mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_16]

And it's here that everything freezes in place.

When I came back to shut down the ElasticSearch process after the weekend,
it created a new log file with this information:
[2014-01-17 17:02:09,968][INFO ][cluster.metadata ] [CHAIRFIELD2-PC]
[onprem] update_mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_16] (dynamic)
[2014-01-20 09:57:41,458][DEBUG][action.admin.indices.mapping.put] [
CHAIRFIELD2-PC] failed to put mappings on indices [[onprem]], type [
com_latitudeqa_qaservices4:
http_arcgis_rest_services_charlottecomplete_mapserver_1]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (put-mapping [com_latitudeqa_qaservices4:
http_arcgis_rest_services_charlottecomplete_mapserver_1]) within 30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(
InternalClusterService.java:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2014-01-20 09:56:20,060][INFO ][action.admin.cluster.node.shutdown] [
CHAIRFIELD2-PC] [partial_cluster_shutdown]: requested, shutting down [[
h9kmEriwTJ-Ln3Iyjghurw]] in [1s]
[2014-01-20 09:56:01,349][DEBUG][action.admin.indices.mapping.put] [
CHAIRFIELD2-PC] failed to put mappings on indices [[onprem]], type [
com_latitudeqa_qaservices4:
http_arcgis_rest_services_charlottecomplete_mapserver_1]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (put-mapping [com_latitudeqa_qaservices4:
http_arcgis_rest_services_charlottecomplete_mapserver_1]) within 30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(
InternalClusterService.java:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

... many hundreds of these failures at roughly 2 minute intervals ...

[2014-01-17 17:04:20,122][DEBUG][action.admin.indices.mapping.put] [
CHAIRFIELD2-PC] failed to put mappings on indices [[onprem]], type [
com_latitudeqa_qaservices3:http_arcgis_rest_services_qa_sde_mapserver_17]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (put-mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_17]) within 30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(
InternalClusterService.java:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2014-01-17 17:02:39,979][DEBUG][action.admin.indices.mapping.put] [
CHAIRFIELD2-PC] failed to put mappings on indices [[onprem]], type [
com_latitudeqa_qaservices3:http_arcgis_rest_services_qa_sde_mapserver_17]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (put-mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_17]) within 30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(
InternalClusterService.java:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

What's particularly tricky is that this issue is very intermittent. We're
actively trying to get a repro with DEBUG logging enabled, and will post
any new information here as we get it.

Any thoughts?

Thanks for reading!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/93ab14db-00fa-4991-b6e7-932b86274c89%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(lukas) #2

Hello,

We are experiencing same problem. We are using elasticsearch for graylog2
and after mappings update elasticsearch hangs ,but not always, thats why
its really hard to replicate in our scenarion (it happens maybe once per
month) otherwise all mapping updates performed properly.

When it hangs ES takes all CPU power of server and fills the swap.

elasticsearch-0.20.6
Ubuntu 11.10
java-6-openjdk

Gap of 3 hours after update_mapping (we could several messages per second)
[2014-01-30 04:07:10,630][INFO ][cluster.metadata ] [Fury]
[graylog2_2] update_mapping [message
] (dynamic)
[2014-01-30 07:11:25,284][WARN ][transport ] [Fury] Received
response for a request that
has timed out, sent [46167ms] ago, timed out [15395ms] ago, action
[discovery/zen/fd/ping], node [[gra
ylog2-server][KrHdHQJoQ1autWMxvx4HUA][inet[/192.168.17.161:9301]]{client=true,
data=false}], id [219156
]

if you have any update pls let me know :-), obviously it has nothing to do
with versions.

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7361808f-49dc-4600-86f5-7ce24a221dbd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian Yoder) #3

Not sure if this is your problem, but OpenJDK 6 is very bad. OpenJDK 7 has
been known to work well. Oracle Java 7 is best. And there's no correlation
between version numbers across Oracle Java and OpenJDK; 7 is just a
coincidence.

This is based on what I've read on this newsgroup, on blogs, and in various
video presentations.

Brian

On Thursday, January 30, 2014 3:45:34 AM UTC-5, lu...@alatest.com wrote:

elasticsearch-0.20.6
Ubuntu 11.10
java-6-openjdk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2ed75ee9-f1cd-4ebe-9aa2-73b673572b10%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(lukas) #4

On friday i changed java version to oracle java 7 and during weekend
elasticsearch freeze on mapping update again :-/

On Thursday, January 30, 2014 4:44:30 PM UTC+1, InquiringMind wrote:

Not sure if this is your problem, but OpenJDK 6 is very bad. OpenJDK 7 has
been known to work well. Oracle Java 7 is best. And there's no correlation
between version numbers across Oracle Java and OpenJDK; 7 is just a
coincidence.

This is based on what I've read on this newsgroup, on blogs, and in
various video presentations.

Brian

On Thursday, January 30, 2014 3:45:34 AM UTC-5, lu...@alatest.com wrote:

elasticsearch-0.20.6
Ubuntu 11.10
java-6-openjdk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/207e809b-5859-46b0-adbd-4a9e55c2518b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5