TransportClient vs. HTTP - TransportClient advantages?


(Otis Gospodnetić) #1

Hello,

What exactly is the advantage of using TransportClient to talk to ES versus
talking via HTTP?

Is there anything that cannot be done via HTTP that can be done using TC?
Is there a known and significant performance difference (even is keepalives
are used, and chunking is not used)?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

--


(Jörg Prante) #2

Hi Otis,

the disadvantages of HTTP are known well, see Simon Spero
http://www.ibiblio.org/mdma-release/http-prob.html (18 years old) and the
recent activity in developing HTTP 2.0 (SPDY).

The REST client works over HTTP by using Netty, which mean it scales well
(asynchronous I/O where possible). A REST client has to marshal/unmarshal
over HTTP which is common to all Java applications (see
XContentRestResponse). The REST client talks to one server only.

The TransportClient can select network interface for node discovery (which
is different from a node client). It uses also Netty but without the
overhead of HTTP headers. ES uses binary stream marshalling/unmarshalling
which is more compact and faster than HTTP. And, the TransportClient pings
the ES cluster automatically for new nodes or nodes that had gone away. It
can fail over automatically.

Each client can use compression (Netty can handle it), I don't know about
the difference. ES offers additional compression algorithms which may help
in certain situations.

It is important to consider both bulk indexing and large query results in
concurrency scenarios.

Best regards,

Jörg

On Saturday, September 22, 2012 5:00:16 AM UTC+2, Otis Gospodnetic wrote:

Hello,

What exactly is the advantage of using TransportClient to talk to ES
versus talking via HTTP?

Is there anything that cannot be done via HTTP that can be done using TC?
Is there a known and significant performance difference (even is
keepalives are used, and chunking is not used)?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

--


(Otis Gospodnetić) #3

Hi Jörg,

On Saturday, September 22, 2012 2:12:23 PM UTC-4, Jörg Prante wrote:

Hi Otis,

the disadvantages of HTTP are known well, see Simon Spero
http://www.ibiblio.org/mdma-release/http-prob.html (18 years old) and the
recent activity in developing HTTP 2.0 (SPDY).

The REST client works over HTTP by using Netty, which mean it scales well
(asynchronous I/O where possible). A REST client has to marshal/unmarshal
over HTTP which is common to all Java applications (see
XContentRestResponse). The REST client talks to one server only.

But the client could:
A) be given more than 1 ES node address
B) be smart enough to ask ES about other nodes via (a new?) REST API

Right?

The TransportClient can select network interface for node discovery (which
is different from a node client). It uses also Netty but without the
overhead of HTTP headers. ES uses binary stream marshalling/unmarshalling
which is more compact and faster than HTTP. And, the TransportClient pings
the ES cluster automatically for new nodes or nodes that had gone away. It
can fail over automatically.

Right, so we have HTTP plain text JSON, maybe compressed, but with HTTP
headers vs. ES binary protocol (which is just some Netty's binary com
protocol?)
So the diff here is the compactness of the request/response and work needed
for unpacking/reading them?

Re pinging so client can know about multiple nodes for failover purposes,
would what I wrote as A and B above work?

Each client can use compression (Netty can handle it), I don't know about
the difference. ES offers additional compression algorithms which may help
in certain situations.

Do these additional compression algorithms (have a pointer by any chance?)
apply to TransportClient only? If not, then this would not be a
difference/advantage of TC-based communication.

It is important to consider both bulk indexing and large query results in
concurrency scenarios.

Right.

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Saturday, September 22, 2012 5:00:16 AM UTC+2, Otis Gospodnetic wrote:

Hello,

What exactly is the advantage of using TransportClient to talk to ES
versus talking via HTTP?

Is there anything that cannot be done via HTTP that can be done using TC?
Is there a known and significant performance difference (even is
keepalives are used, and chunking is not used)?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

--


(Lukáš Vlček) #4

Hi Otis,

I think the main function for HTTP interface/API is to allow easy adoption
outside of Java world and that's all about it.

That being said I would assume that 1/ there should be no difference
between TC and HTTP in terms of functionality and 2/ the target is to make
HTTP based clients as performant as the particular implementation of HTTP
client which is used for communication.

Regards,
Lukas

On Mon, Sep 24, 2012 at 7:25 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi Jörg,

On Saturday, September 22, 2012 2:12:23 PM UTC-4, Jörg Prante wrote:

Hi Otis,

the disadvantages of HTTP are known well, see Simon Spero
http://www.ibiblio.org/mdma-**release/http-prob.htmlhttp://www.ibiblio.org/mdma-release/http-prob.html(18 years old) and the recent activity in developing HTTP 2.0 (SPDY).

The REST client works over HTTP by using Netty, which mean it scales well
(asynchronous I/O where possible). A REST client has to marshal/unmarshal
over HTTP which is common to all Java applications (see
XContentRestResponse). The REST client talks to one server only.

But the client could:
A) be given more than 1 ES node address
B) be smart enough to ask ES about other nodes via (a new?) REST API

Right?

The TransportClient can select network interface for node discovery
(which is different from a node client). It uses also Netty but without the
overhead of HTTP headers. ES uses binary stream marshalling/unmarshalling
which is more compact and faster than HTTP. And, the TransportClient pings
the ES cluster automatically for new nodes or nodes that had gone away. It
can fail over automatically.

Right, so we have HTTP plain text JSON, maybe compressed, but with HTTP
headers vs. ES binary protocol (which is just some Netty's binary com
protocol?)
So the diff here is the compactness of the request/response and work
needed for unpacking/reading them?

Re pinging so client can know about multiple nodes for failover purposes,
would what I wrote as A and B above work?

Each client can use compression (Netty can handle it), I don't know about
the difference. ES offers additional compression algorithms which may help
in certain situations.

Do these additional compression algorithms (have a pointer by any chance?)
apply to TransportClient only? If not, then this would not be a
difference/advantage of TC-based communication.

It is important to consider both bulk indexing and large query results in
concurrency scenarios.

Right.

Thanks,
Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Saturday, September 22, 2012 5:00:16 AM UTC+2, Otis Gospodnetic wrote:

Hello,

What exactly is the advantage of using TransportClient to talk to ES
versus talking via HTTP?

Is there anything that cannot be done via HTTP that can be done using TC?
Is there a known and significant performance difference (even is
keepalives are used, and chunking is not used)?

Thanks,
Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

--

--


(Otis Gospodnetić) #5

Hi Lukáš,

Re performance piece - yeah, that is my main concern/question. I guess we
may need to benchmark it... maybe add support for TC vs. HTTP to
https://github.com/sematext/ActionGenerator ... Herr Kuć will be delighted.
:wink:

Otis Gospodnetić (just to add another diacritic to this thread)

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Monday, September 24, 2012 3:19:01 PM UTC-4, Lukáš Vlček wrote:

Hi Otis,

I think the main function for HTTP interface/API is to allow easy adoption
outside of Java world and that's all about it.

That being said I would assume that 1/ there should be no difference
between TC and HTTP in terms of functionality and 2/ the target is to make
HTTP based clients as performant as the particular implementation of HTTP
client which is used for communication.

Regards,
Lukas

On Mon, Sep 24, 2012 at 7:25 PM, Otis Gospodnetic <otis.gos...@gmail.com<javascript:>

wrote:

Hi Jörg,

On Saturday, September 22, 2012 2:12:23 PM UTC-4, Jörg Prante wrote:

Hi Otis,

the disadvantages of HTTP are known well, see Simon Spero
http://www.ibiblio.org/mdma-**release/http-prob.htmlhttp://www.ibiblio.org/mdma-release/http-prob.html(18 years old) and the recent activity in developing HTTP 2.0 (SPDY).

The REST client works over HTTP by using Netty, which mean it scales
well (asynchronous I/O where possible). A REST client has to
marshal/unmarshal over HTTP which is common to all Java applications (see
XContentRestResponse). The REST client talks to one server only.

But the client could:
A) be given more than 1 ES node address
B) be smart enough to ask ES about other nodes via (a new?) REST API

Right?

The TransportClient can select network interface for node discovery
(which is different from a node client). It uses also Netty but without the
overhead of HTTP headers. ES uses binary stream marshalling/unmarshalling
which is more compact and faster than HTTP. And, the TransportClient pings
the ES cluster automatically for new nodes or nodes that had gone away. It
can fail over automatically.

Right, so we have HTTP plain text JSON, maybe compressed, but with HTTP
headers vs. ES binary protocol (which is just some Netty's binary com
protocol?)
So the diff here is the compactness of the request/response and work
needed for unpacking/reading them?

Re pinging so client can know about multiple nodes for failover purposes,
would what I wrote as A and B above work?

Each client can use compression (Netty can handle it), I don't know
about the difference. ES offers additional compression algorithms which may
help in certain situations.

Do these additional compression algorithms (have a pointer by any
chance?) apply to TransportClient only? If not, then this would not be a
difference/advantage of TC-based communication.

It is important to consider both bulk indexing and large query results
in concurrency scenarios.

Right.

Thanks,
Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Saturday, September 22, 2012 5:00:16 AM UTC+2, Otis Gospodnetic wrote:

Hello,

What exactly is the advantage of using TransportClient to talk to ES
versus talking via HTTP?

Is there anything that cannot be done via HTTP that can be done using
TC?
Is there a known and significant performance difference (even is
keepalives are used, and chunking is not used)?

Thanks,
Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

--

--


(Lukáš Vlček) #6

Hey Otis,

yea, that would be interesting, but I think you need to pay attention to
how much the performance is impacted by used HTTP library itself (so this
is more a question of how exactly the HTTP ES client is implemented). If I
recall correctly Clinton Gormley did interesting experience when he
implemented the Perl client and he switched to different HTTP perl library
and gained interesting performance boost.

But yea, it depends on what exactly you are going to measure and how you
interpret the results from the ES client perspective.

Regards,
Lukas

On Tue, Sep 25, 2012 at 1:55 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi Lukáš,

Re performance piece - yeah, that is my main concern/question. I guess we
may need to benchmark it... maybe add support for TC vs. HTTP to
https://github.com/sematext/ActionGenerator ... Herr Kuć will be
delighted. :wink:

Otis Gospodnetić (just to add another diacritic to this thread)

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Monday, September 24, 2012 3:19:01 PM UTC-4, Lukáš Vlček wrote:

Hi Otis,

I think the main function for HTTP interface/API is to allow easy
adoption outside of Java world and that's all about it.

That being said I would assume that 1/ there should be no difference
between TC and HTTP in terms of functionality and 2/ the target is to make
HTTP based clients as performant as the particular implementation of HTTP
client which is used for communication.

Regards,
Lukas

On Mon, Sep 24, 2012 at 7:25 PM, Otis Gospodnetic otis.gos...@gmail.comwrote:

Hi Jörg,

On Saturday, September 22, 2012 2:12:23 PM UTC-4, Jörg Prante wrote:

Hi Otis,

the disadvantages of HTTP are known well, see Simon Spero
http://www.ibiblio.org/mdma-release/http-prob.htmlhttp://www.ibiblio.org/mdma-release/http-prob.html(18 years old) and the recent activity in developing HTTP 2.0 (SPDY).

The REST client works over HTTP by using Netty, which mean it scales
well (asynchronous I/O where possible). A REST client has to
marshal/unmarshal over HTTP which is common to all Java applications (see
XContentRestResponse). The REST client talks to one server only.

But the client could:
A) be given more than 1 ES node address
B) be smart enough to ask ES about other nodes via (a new?) REST API

Right?

The TransportClient can select network interface for node discovery
(which is different from a node client). It uses also Netty but without the
overhead of HTTP headers. ES uses binary stream marshalling/unmarshalling
which is more compact and faster than HTTP. And, the TransportClient pings
the ES cluster automatically for new nodes or nodes that had gone away. It
can fail over automatically.

Right, so we have HTTP plain text JSON, maybe compressed, but with HTTP
headers vs. ES binary protocol (which is just some Netty's binary com
protocol?)
So the diff here is the compactness of the request/response and work
needed for unpacking/reading them?

Re pinging so client can know about multiple nodes for failover
purposes, would what I wrote as A and B above work?

Each client can use compression (Netty can handle it), I don't know
about the difference. ES offers additional compression algorithms which may
help in certain situations.

Do these additional compression algorithms (have a pointer by any
chance?) apply to TransportClient only? If not, then this would not be a
difference/advantage of TC-based communication.

It is important to consider both bulk indexing and large query results
in concurrency scenarios.

Right.

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/**index.**htmlhttp://sematext.com/spm/index.html

On Saturday, September 22, 2012 5:00:16 AM UTC+2, Otis Gospodnetic
wrote:

Hello,

What exactly is the advantage of using TransportClient to talk to ES
versus talking via HTTP?

Is there anything that cannot be done via HTTP that can be done using
TC?
Is there a known and significant performance difference (even is
keepalives are used, and chunking is not used)?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.****htmlhttp://sematext.com/spm/index.html

--

--

--


(system) #7