Transport client settings

(Avaneesh) #1

I am using Java to connect to my ES (v2.4.0) cluster using a Transport client.
I am seeing very bad query throughput performance. I have two questions:

  1. Does transport client also sort the query results, or is it done by one of the data nodes as in the case of node client?
  2. I see a lot of settings on the internet, such as "network.server", "client.transport.sniff", etc. used while setting up a client. Is there ay documentation where all such settings are explained and listed?

Thank you.

(David Pilato) #2

The query is fully executed on the coordinating node.

You should see the same behavior when using the REST layer.

(Avaneesh) #3

Thanks for the quick reply David.
So am I to assume that transport client just distributes the role of coordinating node between the data nodes in a round robin fashion?

(David Pilato) #4

It does that between nodes you defined when you build the transport client.
If you use sniff, then more nodes can be added to this list.

(Avaneesh) #5

Thanks for the reply David. That was really helpful.
I have a major situation over here: an index with 5 million docs (webpage) content, spread over 3 shards (3 aws instances as data nodes) performs very poorly when I increase the number of results in search query from 10 to 20.
I use a transport client (in a java webapp) with just the default settings, and specify all the 3 data nodes while doing so. When benchmarking with jmeter, i notice that the load averages become very high (around 60 on a four core machine!!) on a fourth node, running the java webapp.

Since the transport client doesnt become the coordinating node for any query, can i safely assume that the increase in load average is not due to the gather phase of a query?

(David Pilato) #6

I agree. That said, you seem to index web pages which means a lot of text content.
Remember that all this data has to be sent from each shard to the coordinating node then to your application which has to load that in memory.
Try to ask for one single field instead of _source by default and see if it changes anything.

Your web app needs also probably some memory.

(Avaneesh) #7

Thanks David.
I'll come back soon with the results of querying a separate field instead of _source!

One observation which I also made was, that my fourth node (which runs the webapp and has huge load averages) sees network throughputs of 500Mbps at 10 results. If I increase the result count to, say 50, this value still stays at 500Mbps, even though the machine supports 1Gbps! This behavior is observed on both node client and transport clients in the webapp.
Could this be due to lack of fine tuning transport client settings such as netty worker thread count etc. ?

(Jörg Prante) #8

Are you sure? Have you set up the machine and the cabling? You mentioned AWS. Do you control the hardware and devices of AWS? Even if a NIC is specified by 1Gbps, that does not mean the machine can reach this. Beside AWS setup, there are many limiting factors, such as CPU (enough cycles must be available for handling traffic), router, cabling (must be CAT6) etc.

(Avaneesh) #9

Hi Jorg! Thanks for the reply.
When I hit the data nodes directly (the REST API), and do an iftop at the same time, I see much better throughputs as compared to the webapp setup, and the network choking at 1 Gbps. That's how I know that to be the limit.
This is also a matter of concern for me: I've set the baseline performance to be when I hit one of the data nodes directly. But my webapp almost consistently underperforms this baseline.
Any help or hints appreciated.

(Avaneesh) #10

Hi David.
I tried including just the webpage title in the query, and yeah it makes a huge difference in throughput! Indeed the CPU cycles were being used to handle the large network IO.

Thanks a lot for the help.

(Jörg Prante) #11

Help is not possible without being able to compare your two setups:

  • what do you mean "when I hit one of the data nodes directly" - is it curl? HTTP client? Or node client?

  • what are the queries being executed on both scenarios?

  • what do you compare? Query structure? Query latency (execution time)? Payload size and transmission speed of response?

  • the default settings for transport client give nearly equivalent performance pattern to node client, so it may be worth to reconsider the node connections to ensure they work properly

With iftop you see the whole traffic on the NIC which may or may not be related to queries. Example: shard recovery, this generates large network traffic but does not contribute to query performance.

(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.