Hello,
Has anyone encountered a "hung" Elasticsearch instance where mapping
creation/updates and index creations timed out? Here are the details...
Elasticsearch 1.0.0.RC1, default configuration (we've seen it on 0.90.5 as
well)
Windows Server 2008/12 and 7/8
Java SE Runtime Environment (1.7.0-b147)
HotSpot 64-Bit Server VM (build 21.0-b17, mixed mode)
We encounter this when inserting via the bulk api. When we have the first
batch of up to 100 documents of a particular type to index, we attempt to
ensure that it has the proper mapping. That mapping will look like this:
{
"com_latitudeqa_qaservices3:http_arcgis_rest_services_qa_sde_mapserver_16":
{
"type": "object",
"_all": {
"enabled": true
},
"dynamic": true,
"_id": {
"path": "attributes.OBJECTID"
},
"properties": {
"location": {
"type": "geo_shape",
"tree": "quadtree",
"precision": "50m",
"validate": "true",
"lat_lon": "true"
}
}
}
}
When it hangs, it always hangs before updating a mapping it just
successfully created. Here is a snippet of the log file at INFO:
... some successful mapping creations and updates ...
[2014-01-17 17:02:07,263][INFO ][cluster.metadata ] [CHAIRFIELD2-PC]
[onprem] create_mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_13]
[2014-01-17 17:02:08,918][INFO ][cluster.metadata ] [CHAIRFIELD2-PC]
[onprem] update_mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_13] (dynamic)
[2014-01-17 17:02:09,156][INFO ][cluster.metadata ] [CHAIRFIELD2-PC]
[onprem] create_mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_15]
[2014-01-17 17:02:09,401][INFO ][cluster.metadata ] [CHAIRFIELD2-PC]
[onprem] update_mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_15] (dynamic)
[2014-01-17 17:02:09,551][INFO ][cluster.metadata ] [CHAIRFIELD2-PC]
[onprem] create_mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_16]
And it's here that everything freezes in place.
When I came back to shut down the Elasticsearch process after the weekend,
it created a new log file with this information:
[2014-01-17 17:02:09,968][INFO ][cluster.metadata ] [CHAIRFIELD2-PC]
[onprem] update_mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_16] (dynamic)
[2014-01-20 09:57:41,458][DEBUG][action.admin.indices.mapping.put] [
CHAIRFIELD2-PC] failed to put mappings on indices [[onprem]], type [
com_latitudeqa_qaservices4:
http_arcgis_rest_services_charlottecomplete_mapserver_1]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (put-mapping [com_latitudeqa_qaservices4:
http_arcgis_rest_services_charlottecomplete_mapserver_1]) within 30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(
InternalClusterService.java:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2014-01-20 09:56:20,060][INFO ][action.admin.cluster.node.shutdown] [
CHAIRFIELD2-PC] [partial_cluster_shutdown]: requested, shutting down [[
h9kmEriwTJ-Ln3Iyjghurw]] in [1s]
[2014-01-20 09:56:01,349][DEBUG][action.admin.indices.mapping.put] [
CHAIRFIELD2-PC] failed to put mappings on indices [[onprem]], type [
com_latitudeqa_qaservices4:
http_arcgis_rest_services_charlottecomplete_mapserver_1]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (put-mapping [com_latitudeqa_qaservices4:
http_arcgis_rest_services_charlottecomplete_mapserver_1]) within 30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(
InternalClusterService.java:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
... many hundreds of these failures at roughly 2 minute intervals ...
[2014-01-17 17:04:20,122][DEBUG][action.admin.indices.mapping.put] [
CHAIRFIELD2-PC] failed to put mappings on indices [[onprem]], type [
com_latitudeqa_qaservices3:http_arcgis_rest_services_qa_sde_mapserver_17]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (put-mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_17]) within 30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(
InternalClusterService.java:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2014-01-17 17:02:39,979][DEBUG][action.admin.indices.mapping.put] [
CHAIRFIELD2-PC] failed to put mappings on indices [[onprem]], type [
com_latitudeqa_qaservices3:http_arcgis_rest_services_qa_sde_mapserver_17]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (put-mapping [com_latitudeqa_qaservices3:
http_arcgis_rest_services_qa_sde_mapserver_17]) within 30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(
InternalClusterService.java:247)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
What's particularly tricky is that this issue is very intermittent. We're
actively trying to get a repro with DEBUG logging enabled, and will post
any new information here as we get it.
Any thoughts?
Thanks for reading!
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/93ab14db-00fa-4991-b6e7-932b86274c89%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.