TransportClient failures with 0.90.3 cluster, but NodeClient works without failures

brian_yoder · December 17, 2013, 11:07pm

I reported this a while ago, but have new information that may help. I have
a cluster of 3 Linux VMs with ES 0.90.3 running and an older Oracle Java 6.
My driver creates a TransportClient,. Then a writer thread pool creates a
series of unique objects, serializes each to JSON, sends an update to the
cluster, then queues it. A separate reader thread pool reads each unique
object and attempts to query by ID to verify that it's in the database.
This test driver is run on a laptop; the 3-node cluster is remote (in the
lab).

Consistently, I get failures if I add all 3 of the host names to the
TransportClient. It works OK if I add only two of them. Here is the
relevant area of the log, along with a message that the driver pulls out
and writes to stdout:

434 [main] DEBUG org.elasticsearch.client.transport - [Martinex]
node_sampler_interval[5s]
453 [elasticsearch[Martinex][transport_client_worker][T#1]{New I/O worker
#1}] DEBUG netty.channel.socket.nio.SelectorUtil - Using select timeout of
500
453 [elasticsearch[Martinex][transport_client_worker][T#1]{New I/O worker
#1}] DEBUG netty.channel.socket.nio.SelectorUtil - Epoll-bug workaround
enabled = false
476 [main] DEBUG org.elasticsearch.client.transport - [Martinex] adding
address [[#transport#-1][inet[projdev12/192.168.200.222:9300]]]
587 [main] DEBUG org.elasticsearch.transport.netty - [Martinex] connected
to node [[#transport#-1][inet[projdev12/192.168.200.222:9300]]]
701 [main] DEBUG org.elasticsearch.client.transport - [Martinex] adding
address [[#transport#-2][inet[projdev29/192.168.200.241:9300]]]
871 [main] DEBUG org.elasticsearch.transport.netty - [Martinex] connected
to node [[#transport#-2][inet[projdev29/192.168.200.241:9300]]]
955 [main] DEBUG org.elasticsearch.client.transport - [Martinex] adding
address [[#transport#-3][inet[projdev33/192.168.200.133:9300]]]
1209 [main] DEBUG org.elasticsearch.transport.netty - [Martinex] connected
to node [[#transport#-3][inet[projdev33/192.168.200.133:9300]]]
Cluster available...
1396 [main] DEBUG com.leansoft.bigqueue.page.MappedPageFactoryImpl -
Mapped page for /tmp/rtcomm/rtc-driver/meta_data/page-0.dat was just
created and cached.
1437 [main] DEBUG com.leansoft.bigqueue.page.MappedPageFactoryImpl -
Mapped page for /tmp/rtcomm/rtc-driver/index/page-0.dat was just created
and cached.
1452 [main] DEBUG com.leansoft.bigqueue.page.MappedPageFactoryImpl -
Mapped page for /tmp/rtcomm/rtc-driver/front_index/page-0.dat was just
created and cached.
1455 [main] DEBUG com.leansoft.bigqueue.page.MappedPageImpl - Mapped page
for /tmp/rtcomm/rtc-driver/index/page-0.dat was just unmapped and closed.
1456 [main] INFO com.leansoft.bigqueue.page.MappedPageFactoryImpl - Page
file /tmp/rtcomm/rtc-driver/index/page-0.dat was just deleted.
1456 [main] DEBUG com.leansoft.bigqueue.page.MappedPageFactoryImpl - All
page files in dir /tmp/rtcomm/rtc-driver/index/ have been deleted.
1456 [main] INFO com.leansoft.bigqueue.page.MappedPageFactoryImpl - Page
file /tmp/rtcomm/rtc-driver/data/page-0.dat was just deleted.
1456 [main] DEBUG com.leansoft.bigqueue.page.MappedPageFactoryImpl - All
page files in dir /tmp/rtcomm/rtc-driver/data/ have been deleted.
1457 [main] DEBUG com.leansoft.bigqueue.page.MappedPageImpl - Mapped page
for /tmp/rtcomm/rtc-driver/meta_data/page-0.dat was just unmapped and
closed.
1457 [main] INFO com.leansoft.bigqueue.page.MappedPageFactoryImpl - Page
file /tmp/rtcomm/rtc-driver/meta_data/page-0.dat was just deleted.
1457 [main] DEBUG com.leansoft.bigqueue.page.MappedPageFactoryImpl - All
page files in dir /tmp/rtcomm/rtc-driver/meta_data/ have been deleted.
1458 [main] DEBUG com.leansoft.bigqueue.page.MappedPageFactoryImpl -
Mapped page for /tmp/rtcomm/rtc-driver/meta_data/page-0.dat was just
created and cached.
1458 [main] DEBUG com.leansoft.bigqueue.page.MappedPageFactoryImpl - Hit
mapped page /tmp/rtcomm/rtc-driver/front_index/page-0.dat in cache.
STARTING: run=1s threads=1 connection-limit=1 refresh=eventual
1606 [Thread-3] DEBUG com.acme.proj.database.UpdateAction -
UpdateAction.index:
{"index":{"_index":"rtctest","_type":"connection","_id":"30303230303030303032407777772E63656C65626F726E2E636F6D"}}
::
{"onet":"celeborn","orig":"0010000001@celeborn.com","term":"0020000002@www.celeborn.com"}
FAILURE[1] when writing connnection[1]:
{"index":{"_index":"rtctest","_type":"connection","_id":"30303230303030303032407777772E63656C65626F726E2E636F6D"}}
->
{"onet":"celeborn","orig":"0010000001@celeborn.com","term":"0020000002@www.celeborn.com"}
:: class com.acme.proj.database.DatabaseException ElasticSearch index
request:
{"index":{"_index":"rtctest","_type":"connection","_id":"30303230303030303032407777772E63656C65626F726E2E636F6D"}}
::
{"onet":"celeborn","orig":"0010000001@celeborn.com","term":"0020000002@www.celeborn.com"}
: org.elasticsearch.transport.TransportSerializationException: Failed to
deserialize exception response from stream

This was originally chalked up to using an older Java version. But
recently, I updated my driver to optionally create a client-only NodeClient
instead of a TransportClient. Using Zen unicast, I add all 3 host names to
the unicast host list, create a client-only data-less NodeClient, and the
rest of the code is the same from there on. Start-up takes a little longer,
as the NodeClient has to join the cluster and whatever else it does. But
updates are twice as fast, and... there are NO ERRORS!

I am just guessing, but it appears from the outside looking in as if the
NodeClient is the primary way that ES is tested, and perhaps the
TransportClient isn't so extensively tested? I'm not sure what else I can
offer, but the fact that the NodeClient doesn't fail seems to point to the
TransportClient and not the underlying Lucene/Java interaction; the fact
that the NodeClient performs better is icing on the cake.

Regards,
Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/331ec0b9-ddba-4a6c-80dd-1dbbddd8655c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · December 18, 2013, 1:12am

The reason why TransportClient is stumbling over the
TransportSerializationException is a bug fix by Oracle. Internal JVM
network addresses (the IPv4 addresses) are encoded by the JVM when messages
goes over the wire, and the TransportClient must have such a network
address as internal field. Oracle discovered a security flaw and decided to
change the internal encoding of network addresses in 7u21 so each Java
application will fail if messages are transmitted between JVMs across the
7u21 version that contain network addresses (the InetAddress class). It is
not a TransportClient only issue.

Here is the commit that is responsible for this:
http://hg.openjdk.java.net/jdk7u/jdk7u-dev/jdk/rev/7ca8a40795d8

RHEL bug report: https://bugzilla.redhat.com/show_bug.cgi?id=952657

Similar changes made it impossible to transmit Java objects between a Java
6 JVM and a Java 7 JVM.

The NodeClient does not use the transport protocol between JVMs and does
not have to handle InetAddress at all, so it is not affected by the
TransportSerializationException issue.

The TransportClient is extensively tested (at least by myself, I use it
exclusively). It is one of the cases where ES can not fix it, because it's
the JVM change that pulls the rug out from under ES.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHzqcH23zYd1nya%3DDQjurxK3fk8MKhemjA3CNRFT1Qmiw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

brian_yoder · December 18, 2013, 4:43pm

Jörg,

Thanks so much for the insightful response.

*The reason why TransportClient is stumbling over the

TransportSerializationException is a bug fix by Oracle. Internal JVM
network addresses (the IPv4 addresses) are encoded by the JVM when messages
goes over the wire, and the TransportClient must have such a network
address as internal field. Oracle discovered a security flaw and decided to
change the internal encoding of network addresses in 7u21 so each Java
application will fail if messages are transmitted between JVMs across the
7u21 version that contain network addresses (the InetAddress class). It is
not a TransportClient only issue.*

Here is the commit that is responsible for
this: http://hg.openjdk.java.net/jdk7u/jdk7u-dev/jdk/rev/7ca8a40795d8
http://hg.openjdk.java.net/jdk7u/jdk7u-dev/jdk/rev/7ca8a40795d8

RHEL bug report: https://bugzilla.redhat.com/show_bug.cgi?id=952657
https://bugzilla.redhat.com/show_bug.cgi?id=952657

Similar changes made it impossible to transmit Java objects between a
Java 6 JVM and a Java 7 JVM.

I checked again, and all 3 nodes in the cluster are running the exact same
Java version. Yes, it's an older version. But for now, it's what the
support folks are giving me:

$ java -version
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) Client VM (build 16.0-b13, mixed mode, sharing)

And this cluster is actually running on 3 Solaris x86 VMs (not Linux):

$ uname -a
SunOS eciddev29 5.10 Generic_141445-09 i86pc i386 i86pc

So because of the older Java version, should I just assume that even though
it's not the Java 7 bug that Oracle introduced, but some other Java bug?
(Just helps me collect the data I need to make a case to update Java!)

The NodeClient does not use the transport protocol between JVMs and does
not have to handle InetAddress at all, so it is not affected by the
TransportSerializationException issue.

Cool! That's why it works! But I've also found that updates are noticably
quicker through the NodeClient. So using NodeClient a solid win, not a hack
workaround.

The TransportClient is extensively tested (at least by myself, I use it
exclusively). It is one of the cases where ES can not fix it, because it's
the JVM change that pulls the rug out from under ES.*

Yes, I apologize. Now that I know more of the details, it all makes sense.
And I really didn't believe that any part of ES wasn't well tested!

Regards,
Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7685e73b-63e6-4ba3-a5d0-79afdd696b2f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · December 18, 2013, 5:03pm

Beside the cluster node JVMs you also have to take care of the client JVM.
Are you accessing the cluster also with Solaris x86 and Java 6u18?

Can you give more information about "noticeably quicker"? What do you test
and how much load? Searching or indexing?

The TransportClient does not have a copy of the cluster state and must
fetch it from remote. This may cause delays in mapping related operations.
Also if you have a slow network, it will add some latency. But that's it.

In my bulk indexing tests, TransportClient with specific net address
configuration is fast and more flexible, because I can start an extra JVM
on a non-ES machine, connect to multiple clusters, and the expensive task
of JSON doc construction is decoupled from the workload in ES data node
JVMs.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGH3Tf3cN_iPMzgekKVVgQ39bD0GdfnZJ6bYw6PaSA63w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

brian_yoder · December 18, 2013, 7:36pm

Jörg,

*Beside the cluster node JVMs you also have to take care of the client JVM.

Are you accessing the cluster also with Solaris x86 and Java 6u18?*

Oooh. Don't know why this didn't occur to me. Short answer: No.

Java on the MacBook (where the client / driver runs):

$ java -version
java version "1.6.0_65"
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-10M4609)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)

Java on all 3 virtual Solaris hosts of the remote 3-node ES cluster:

$ java -version
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) Client VM (build 16.0-b13, mixed mode, sharing)

Can you give more information about "noticeably quicker"? What do you
test and how much load? Searching or indexing?

The ES index and type are mapped to enable TTL with a default 10s TTL
value for all documents, and the default 60s TTL interval for the index.
I've disabled the indexing for all fields; documents are queried only by
their _id. Ad-hoc queries weren't necessary for this particular test case.

The remote far-away 3-node ES cluster is running on 3 Solaris x86-64 VMs
using Zen unicast discovery. There is also a single-node ES cluster running
on the MacBook.

The driver contains a writer thread pool, a reader thread pool, and
BigQueue in the middle. All of the runs below were configured with 8 writer
threads and 8 reader threads. For all tests, the driver was run on the
MacBook.

The writer threads obtain a unique object, serialize it to JSON, add it to
ES, and then add it to the queue. The reader threads read from the queue,
deserialize into the object, and query by index+type+id to verify it's
there.

The timing values are shown by the driver using the super-cool TimeValue
object class. Nice touch!

1. Local single-node ES cluster and the driver, all running on the
MacBook. The TransportClient is a little bit faster than the NodeClient:

1a. Using the TransportClient:

generated-connections=1629365 elapsed=5m conn/sec=5430
[db-update: total=1629358 time=32.5m time/update=1.1ms]
[db-query: total=1629357 time=16.8m time/query=620.1micros]
[queue: current=0 max=373]

1b. Using the NodeClient:

generated-connections=1551379 elapsed=5m conn/sec=5171
[db-update: total=1551371 time=32.8m time/update=1.2ms]
[db-query: total=1551371 time=16.4m time/query=637.7micros]
[queue: current=0 max=383]

2. Driver running locally on the MacBook connected to the far-away 3-node
ES cluster running on Solaris x86-64 VMs. In this case, the NodeClient was
seen to be faster, particularly in the area of updates.

2a. The driver uses a TransportClient but only 2 of the 3 nodes are added
to its list of inet addresses:

generated-connections=13427 elapsed=5m conn/sec=44
[db-update: total=13419 time=39.9m time/update=178.4ms]
[db-query: total=13419 time=18.7m time/query=83.6ms]
[queue: current=0 max=7]

2b. The driver uses a client-only NodeClient with Zen unicast discovery and
all 3 nodes configured for it:

generated-connections=27592 elapsed=5m conn/sec=91
[db-update: total=27584 time=39.8m time/update=86.7ms]
[db-query: total=27584 time=38.7m time/query=84.2ms]
[queue: current=0 max=26]

Regards,
Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2e7390d-2ca2-4094-bdf2-c2969daeee5a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · December 19, 2013, 12:34am

OK, didn't know you use Java 6. The InetAddress change was in 6u45, so you
are affected when you use 6u65 with 6u18.

Thanks for the numbers.

I'm not sure how I can understand "generted-connections" and "conn/sec",
do you close connections? You don't have to. It seems you are testing
discovery speed and not indexing?

You don't need to query index/type/id to verify "it's there" - this is
costly and kills performance because you force switching to IndexReaders
all the time. Just look into your BulkResponse object. If there is no
error, the doc was persisted just before the BulkResonse was sent back to
the client.

The local TransportClient should be a bit slower than NodeClient, weird.

Also, TTL is a performance killer, I would like to suggest disabling this,
if possible.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHzBpRaYMmKPVTvhd3%3Dj-F5LicpsQcEDhzP8XouVwWQhA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

brian_yoder · December 19, 2013, 4:14pm

Jörg,

Thanks again for your insights and patience!

*OK, didn't know you use Java 6. The InetAddress change was in 6u45, so you

are affected when you use 6u65 with 6u18.*

That's very interesting. It threw me since I have always been able to query
and update remote single-node clusters with the same version mismatch (Java
6 but different update numbers. And when connecting to the 3-node cluster,
only one of the nodes caused a problem: If I omitted that node and just
pointed the TransportClient to the other two nodes, all was well.

I'm not sure how I can understand "generted-connections" and "conn/sec",
do you close connections? You don't have to. It seems you are testing
discovery speed and not indexing?

Ah, terminology overload. My apologies. These aren't ES client connections.
But it's only fair that if I question the TransportClient's testing, you
may question my design skills.

But no worries! I learned long ago that there is one and only one Client in
an application, it exists for the life of the application, and it is closed
only at the end of the application (shutdown hook, for servers).

*You don't need to query index/type/id to verify "it's there" - this is
costly and kills performance because you force switching to IndexReaders
all the time. *

In this driver, a "connection" represents an email connection: from -> to.
The writer threads act as the inbound email gateway, and the reader threads
act as a remote inbound email gateway. The queue in the middle acts as an
email proxy that is passing the "connections" from the outbound gateway
across to the remote inbound gateway. ES is keeping the information around
long enough for the inbound gateway to query to see if the connection is
one that originated from the proxy.

Just look into your BulkResponse object. If there is no error, the doc
was persisted just before the BulkResonse was sent back to the client.

These aren't bulk loads: They are indexed as as they happen, and are added
to ES via an IndexRequest.

Now, I just realized that if I aggregated inbound connections via an LMAX
Disruptor RingBuffer, then the consumers could detect a batch and issue a
BulkRequest instead of individual IndexRequest. But I'm getting ahead of
myself here...

But in the intended application, the processing done by the driver's writer
threads and the processing done by the driver's reader threads is
completely separate, connected only by a persistent store (which is likely
to be ES based on my testing).

The local TransportClient should be a bit slower than NodeClient, weird.

Also, TTL is a performance killer, I would like to suggest disabling
this, if possible.

The information about a "connection" from->to isn't needed very long, so a
TTL of 10s was chosen to see for myself how TTL performs. If it doesn't
meet our performance goals (the current 3-node cluster is a very
underpowered set of 3 VMs and its raw numbers are as useful as its relative
numbers in the face of changes.), thenI will probably need to implement
Plan B: An index is updated for 10s, then a second index is updated for
10s, then a thrird, and so on. After 2 x 10s, the oldest indices are
deleted. Aliases are used so the queries alway look at the (up to) two most
recent indices. I believe I first saw this on the newsgroup somewhere. I
certainly don't mind writing the code, but only if it's absolutely
necessary and TTL is really unable to do the job.

Regards,
Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d187e3df-9b72-44a2-a276-666ce3b0a89c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

brian_yoder · December 19, 2013, 11:26pm

Jörg,

I hope my explanation cleared things up for you.

The local TransportClient should be a bit slower than NodeClient, weird.

I updated to version 0.90.8. Still on the same Java 6 update, but at least
it's real Oracle Java and not IBM / OpenJDK Java. Only one of my 119 Java
source files needed to change for the migration, and the change was indeed
minimal (though it took some time to discover the proper incantation for
iterating across an ImmutableOpenMap).

Using the TransportClient:

Connecting to cluster...
Connected to cluster[brian-exploration] on host[localhost] using
client[TRANSPORT]
Cluster available...
STARTING: run=5m threads=8 refresh=eventual
Shutting down...
DONE:
generated-connections=2599951 elapsed=5m conn/sec=8665
[db-update: total=2599945 time=30.9m time/update=714.4micros]
[db-query: total=2599945 time=17.4m time/query=401.7micros]
[queue: current=0 max=382]

Using the NodeClient:

Connecting to cluster...
Connected to cluster[brian-exploration] on host[localhost] using
client[NODE]
Cluster available...
STARTING: run=5m threads=8 refresh=eventual
Shutting down...
DONE:
generated-connections=2996410 elapsed=5m conn/sec=9987
[db-update: total=2996405 time=30.2m time/update=605.3micros]
[db-query: total=2996405 time=17.8m time/query=357.6micros]
[queue: current=0 max=387]

Note the jump in performance on the local laptop from 5430 "connections"
(aka unique objects) per second with 0.90.3 to 8665 per second with version
0.90.8.

And also note that the NodeClient shows up significantly faster than the
TransportClient. Perhaps the previous tests were done earlier in the day
when background processing such as Apple Mail or Time Machine was also
hitting the disk. Now that it relatively late (EST, Florida) and my laptop
is quieter, I trust the numbers a little bit more.

And again, thank you for the help from you and others on this newsgroup! It
helped greatly to make the code migration very smooth. This performance
test hit most of the functions that I have been using, including mappings,
updates, and queries, and 0.90.8 has so far been flawless and faster.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f49420d6-3be8-40fd-95a7-f371a915d23e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · December 20, 2013, 9:40am

The performance difference between NodeClient and TransportClient is
~10-15% ? That is unbelievable. You are querying local data node with
NodeClient, do you?

Jörg

On Fri, Dec 20, 2013 at 12:26 AM, InquiringMind brian.from.fl@gmail.comwrote:

Using the TransportClient:

Connecting to cluster...
Connected to cluster[brian-exploration] on host[localhost] using
client[TRANSPORT]
Cluster available...
STARTING: run=5m threads=8 refresh=eventual
Shutting down...
DONE:
generated-connections=2599951 elapsed=5m conn/sec=8665
[db-update: total=2599945 time=30.9m time/update=714.4micros]
[db-query: total=2599945 time=17.4m time/query=401.7micros]
[queue: current=0 max=382]

Using the NodeClient:

Connecting to cluster...
Connected to cluster[brian-exploration] on host[localhost] using
client[NODE]
Cluster available...
STARTING: run=5m threads=8 refresh=eventual
Shutting down...
DONE:
generated-connections=2996410 elapsed=5m conn/sec=9987
[db-update: total=2996405 time=30.2m time/update=605.3micros]
[db-query: total=2996405 time=17.8m time/query=357.6micros]
[queue: current=0 max=387]

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGJgirzv%2BpDhknsUPj%2Bh6X4AGwju-7o7pWmZwP3FN0d0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

brian_yoder · December 20, 2013, 4:15pm

Jörg,

I am currently thinking this is just statistical noise. There are a lot of
other factors going on.

I re-ran my tests today. Without restarting ES, I first ran with the
NodeClient, then had breakfast, then came back and ran with the
TransportClient. Now the TransportClient is quicker by about the same
amount. But no big deal, as both runs showed ES updates and queries
occurring in about 600micros. Cool!

Then I re-ran using the NodeClient, and ES starting getting OOM errors. I
guess I have been dancing close to the edge with all these threads and
everything on the same laptop!

The driver failed nearly all of its operations, and ES Head showed the
cluster status RED and its shard to be unassigned. Kinda scary.

Then a minute or so later, the ES cluster went back to GREEN, and the shard
was repaired and assigned to the index. The index contained over 1M
documents (to be expected, since OOM prevented TTL processing from deleting
them, and also prevented the status from getting back to the driver). After
some more minutes went by, the TTL processing kicked in and whittled away
at the index until it was finally empty.

I hadn't intended to test the ability of ES to get knocked down hard and
then get back up and dust itself off and (rather quickly) get back on-line
and happily running again.

ES rocks, and Version 0.90.8 is the most remarkable version yet! Thanks
all!!!

Brian

On Friday, December 20, 2013 4:40:26 AM UTC-5, Jörg Prante wrote:

The performance difference between NodeClient and TransportClient is
~10-15% ? That is unbelievable. You are querying local data node with
NodeClient, do you?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/41c00160-a53b-4eda-ab55-7b2ce7a2d3b9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
ElasticSearch Transport client with multiple addresses fails Elasticsearch	10	1444	July 6, 2017
Failed to start the client Elasticsearch	5	400	July 6, 2017
Loss of Connection between Nodes Elasticsearch	5	745	July 6, 2017
Transport Client unable to resolve hostname in certain cases Elasticsearch	12	3673	July 6, 2017
Serialization issues on 0.90.3 Elasticsearch	9	505	July 6, 2017

TransportClient failures with 0.90.3 cluster, but NodeClient works without failures

Related topics