0.19.11 client node in solaris weblogic web app crashes jvm after creating hundreds of transport_client_worker threads

ABC · December 10, 2012, 3:29pm

Hi,

I am upgrading from 0.17.x to 0.19.11. Have deleted old index files.
Stand alone jvm Elastic Search server starts fine. It connects to
standalone jvm client node and indexes data without any problem.
A 0.19.11 client node inside solaris weblogic web app connecting to the
above server node lands up creating hundreds of tranport_client_worker
threads and crashes weblogic.
I see hundreds of these lines in the crash log:
"0x00000001040d1000 JavaThread "elasticsearch[Oneg the
Prober][transport_server_worker][T#255]{New I/O worker #511}" daemon
[_thread_in_native, id=797, stack(0xfffffffe9f400000,0xfffffffe9f500000)]"
followed by hundreds of these:
"0x0000000103def000 JavaThread "ExecuteThread: '126' for queue:
'weblogic.socket.Muxer'" daemon [_thread_blocked, id=272,
stack(0xfffffffee2400000,0xfffffffee2500000)]"
My weblogic log shows below:
[INFO ][10-Dec 10:00:10,717][][node][Oneg the Prober] {0.19.11}[13351]:
starting ...
[INFO ][10-Dec 10:00:12,984][][transport][Oneg the Prober] bound_address
{inet[/169.49.110.160:9403]}, publish_address {inet[/169.49.110.160:9403]}

An unexpected error has been detected by Java Runtime Environment:

SIGBUS (0xa) at pc=0xffffffff7e2fe008, pid=13351, tid=560

Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode

solaris-sparc)

Problematic frame:

V [libjvm.so+0x6fe008]

An error report file with more information is saved as:

/app/securities/rapport/bin/hs_err_pid13351.log

If you would like to submit a bug report, please visit:

http://java.sun.com/webapps/bugreport/crash.jsp

Stack: [0xfffffffebce00000,0xfffffffebcf00000], sp=0xfffffffebcefc910,
free space=1010k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
V [libjvm.so+0x6fe008]

Any help would be appreciated.

--

ABC · December 10, 2012, 3:30pm

In point (1) below, I am also using "compress.lzf.decoder: safe" on the
standalone ES server

On Monday, 10 December 2012 15:29:24 UTC, ABC wrote:

Hi,

I am upgrading from 0.17.x to 0.19.11. Have deleted old index files.
Stand alone jvm Elastic Search server starts fine. It connects to
standalone jvm client node and indexes data without any problem.

A 0.19.11 client node inside solaris weblogic web app connecting to the
above server node lands up creating hundreds of tranport_client_worker
threads and crashes weblogic.

I see hundreds of these lines in the crash log:
"0x00000001040d1000 JavaThread "elasticsearch[Oneg the
Prober][transport_server_worker][T#255]{New I/O worker #511}" daemon
[_thread_in_native, id=797, stack(0xfffffffe9f400000,0xfffffffe9f500000)]"
followed by hundreds of these:
"0x0000000103def000 JavaThread "ExecuteThread: '126' for queue:
'weblogic.socket.Muxer'" daemon [_thread_blocked, id=272,
stack(0xfffffffee2400000,0xfffffffee2500000)]"

My weblogic log shows below:
[INFO ][10-Dec 10:00:10,717][node][Oneg the Prober] {0.19.11}[13351]:
starting ...
[INFO ][10-Dec 10:00:12,984][transport][Oneg the Prober] bound_address
{inet[/169.49.110.160:9403]}, publish_address {inet[/169.49.110.160:9403
]}

An unexpected error has been detected by Java Runtime Environment:

SIGBUS (0xa) at pc=0xffffffff7e2fe008, pid=13351, tid=560

Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode

solaris-sparc)

Problematic frame:

V [libjvm.so+0x6fe008]

An error report file with more information is saved as:

/app/securities/rapport/bin/hs_err_pid13351.log

If you would like to submit a bug report, please visit:

Bug Report

Stack: [0xfffffffebce00000,0xfffffffebcf00000], sp=0xfffffffebcefc910,
free space=1010k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
V [libjvm.so+0x6fe008]

Any help would be appreciated.

--

jprante · December 10, 2012, 5:24pm

Which Java do you ran on ES server, which on ES client side?

JVM 10.0-b22 is Java 1.6.0_06 (April 2008), more than four years old. Such
old java versions have many bugs that will stop you from running ES
successfully.

Please update to the latest Java 7. Note that Java 6 is scheduled end of
life for February, 2013, by Oracle.

Be aware, you can not mix Java 6 and Java 7 in ES client/server
configurations due to JVM object serialization issues.

If you are bound to an obsolete Java version, you could try to tweak the
JVM parameters so they do not trigger subtle bugs, but YMMV. It will be
hard and frustrating.

There are many transport client threads, as thread pools have been enlarged
in ES since 0.17. Hundreds of threads are way too much of course. I assume
you have many transport client instances open or the transport client has
difficulties to connect and starts threads in panic while retrying. But,
it's a secondary error, the primary error is you can't connect at all.

Best regards,

Jörg

--

ABC · December 12, 2012, 11:53am

Thanks Jorg,

I will try to get the jvm upgraded. Is it possible to limit the number of
threads the client node will generate while trying to connect to server? I
am assuming the thread pool settingshttp://www.elasticsearch.org/guide/reference/modules/threadpool.html are
only for the server nodes and not for client nodes. Please tell me if I am
wrong.

On Monday, 10 December 2012 17:24:22 UTC, Jörg Prante wrote:

Which Java do you ran on ES server, which on ES client side?

JVM 10.0-b22 is Java 1.6.0_06 (April 2008), more than four years old. Such
old java versions have many bugs that will stop you from running ES
successfully.

Please update to the latest Java 7. Note that Java 6 is scheduled end of
life for February, 2013, by Oracle.

Be aware, you can not mix Java 6 and Java 7 in ES client/server
configurations due to JVM object serialization issues.

If you are bound to an obsolete Java version, you could try to tweak the
JVM parameters so they do not trigger subtle bugs, but YMMV. It will be
hard and frustrating.

There are many transport client threads, as thread pools have been
enlarged in ES since 0.17. Hundreds of threads are way too much of course.
I assume you have many transport client instances open or the transport
client has difficulties to connect and starts threads in panic while
retrying. But, it's a secondary error, the primary error is you can't
connect at all.

Best regards,

Jörg

--

ABC · December 12, 2012, 12:16pm

Is there a way to limit the number of threads used by client node? I am
assuming the thread pool settingshttp://www.elasticsearch.org/guide/reference/modules/threadpool.html are
only for the server nodes and not for client nodes.
another observation is that the client node connects to the server and then
immediately fails. The ES server node logs show:

[Threnody] added {[The Stepford Cuckoos*]*[-VIITrjqTPKRlNF9hz218Q][inet[/169.49.110.160:9402]]{client=true,
data=false},}, reason: zen-disco-receive(join from node[[The Stepford
Cuckoos][-VIITrjqTPKRlNF9hz218Q][inet[/169.49.110.160:9402]]{client=true,
data=false}])
[2012-12-12 07:12:13,727][INFO ][cluster.service ] [Threnody]
removed {[The Stepford Cuckoos][-VIITrjqTPKRlNF9hz218Q][inet[/169.49.110.160:9402]]{client=true,
data=false},}, reason: zen-disco-node_failed([The Stepford
Cuckoos][-VIITrjqTPKRlNF9hz218Q][inet[/169.49.110.160:9402]]{client=true,
data=false}), reason transport disconnected (with verified connect)

On Wednesday, 12 December 2012 11:53:13 UTC, ABC wrote:

Thanks Jorg,

I will try to get the jvm upgraded. Is it possible to limit the number of
threads the client node will generate while trying to connect to server? I
am assuming the thread pool settingshttp://www.elasticsearch.org/guide/reference/modules/threadpool.html are
only for the server nodes and not for client nodes. Please tell me if I am
wrong.

On Monday, 10 December 2012 17:24:22 UTC, Jörg Prante wrote:

Which Java do you ran on ES server, which on ES client side?

JVM 10.0-b22 is Java 1.6.0_06 (April 2008), more than four years old.
Such old java versions have many bugs that will stop you from running ES
successfully.

Please update to the latest Java 7. Note that Java 6 is scheduled end of
life for February, 2013, by Oracle.

Be aware, you can not mix Java 6 and Java 7 in ES client/server
configurations due to JVM object serialization issues.

If you are bound to an obsolete Java version, you could try to tweak the
JVM parameters so they do not trigger subtle bugs, but YMMV. It will be
hard and frustrating.

There are many transport client threads, as thread pools have been
enlarged in ES since 0.17. Hundreds of threads are way too much of course.
I assume you have many transport client instances open or the transport
client has difficulties to connect and starts threads in panic while
retrying. But, it's a secondary error, the primary error is you can't
connect at all.

Best regards,

Jörg

--

jprante · December 12, 2012, 1:13pm

Hi,

Server and client nodes share the threadpool settings, for simplicity of
code design. A TransportClient does not use all thread pool types, only a
few.

You can limit the thread pools for a TransportClient that is only indexing
data like this. Assume you want ten Netty connections and ten thread for
index or bulk:

ImmutableSettings.settingsBuilder().put("cluster.name","mycluster")
.put("client.transport.sniff", true)
.put("transport.netty.connections_per_node.low", 0)
.put("transport.netty.connections_per_node.med", 0)
.put("transport.netty.connections_per_node.high", 10)
.put("threadpool.search.type", "fixed")
.put("threadpool.search.size", "1")
.put("threadpool.get.type", "fixed")
.put("threadpool.get.size", "1")
.put("threadpool.index.type", "fixed")
.put("threadpool.index.size", "10")
.put("threadpool.bulk.type", "fixed")
.put("threadpool.bulk.size", "10")
.put("threadpool.refresh.type", "fixed")
.put("threadpool.refresh.size", "1")
.put("threadpool.percolate.type", "fixed")
.put("threadpool.percolate.size", "1")
.build();

Immediate disconnects have sometimes obscure reasons. Please enable the
DEBUG level in the logging, this can reveal more information.

For example, it may happen because of different JVM versions between client
and server.

Jörg

--

ABC · December 12, 2012, 3:45pm

Jorg,

The final solution for my problem was:

get both client and server nodes to run on the same jvm
Add "compress.lzf.decoder: safe" setting to both client and server.

Without both of the above, the client continued to crash.

Without the above 2 settings the solaris client node uses and blocks all
available threads. Surely this must be a bug that needs to be addressed. I
will add the client thread settings you mentioned as a safety net.

Thank you! Your suggestions were really useful!

On Wednesday, 12 December 2012 13:13:43 UTC, Jörg Prante wrote:

Hi,

Server and client nodes share the threadpool settings, for simplicity of
code design. A TransportClient does not use all thread pool types, only a
few.

You can limit the thread pools for a TransportClient that is only indexing
data like this. Assume you want ten Netty connections and ten thread for
index or bulk:

ImmutableSettings.settingsBuilder().put("cluster.name","mycluster")
.put("client.transport.sniff", true)
.put("transport.netty.connections_per_node.low", 0)
.put("transport.netty.connections_per_node.med", 0)
.put("transport.netty.connections_per_node.high", 10)
.put("threadpool.search.type", "fixed")
.put("threadpool.search.size", "1")
.put("threadpool.get.type", "fixed")
.put("threadpool.get.size", "1")
.put("threadpool.index.type", "fixed")
.put("threadpool.index.size", "10")
.put("threadpool.bulk.type", "fixed")
.put("threadpool.bulk.size", "10")
.put("threadpool.refresh.type", "fixed")
.put("threadpool.refresh.size", "1")
.put("threadpool.percolate.type", "fixed")
.put("threadpool.percolate.size", "1")
.build();

Immediate disconnects have sometimes obscure reasons. Please enable the
DEBUG level in the logging, this can reveal more information.

For example, it may happen because of different JVM versions between
client and server.

Jörg

--

Topic		Replies	Views
Total threads in use increases without bound until node crashes Elasticsearch	18	1061	July 6, 2017
Elasticsearch server crashing Elasticsearch	3	1135	July 6, 2017
TransportClient behavior when the server node is not availble Elasticsearch	4	768	July 6, 2017
Java client for "Too many open files" Elasticsearch	7	1609	July 6, 2017
Warn which crashes server Elasticsearch	16	2162	July 6, 2017

0.19.11 client node in solaris weblogic web app crashes jvm after creating hundreds of transport_client_worker threads

An unexpected error has been detected by Java Runtime Environment:

SIGBUS (0xa) at pc=0xffffffff7e2fe008, pid=13351, tid=560

Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode

Problematic frame:

V [libjvm.so+0x6fe008]

An error report file with more information is saved as:

/app/securities/rapport/bin/hs_err_pid13351.log

If you would like to submit a bug report, please visit:

http://java.sun.com/webapps/bugreport/crash.jsp

An unexpected error has been detected by Java Runtime Environment:

SIGBUS (0xa) at pc=0xffffffff7e2fe008, pid=13351, tid=560

Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode

Problematic frame:

V [libjvm.so+0x6fe008]

An error report file with more information is saved as:

/app/securities/rapport/bin/hs_err_pid13351.log

If you would like to submit a bug report, please visit:

Bug Report

Related topics