Elasticsearch TransportClient

Hi everybody,

I'm experimeting with elasticsearch and I like it a lot so far.
I have a question concerning the packaging though: I'd like to use
elasticsearch from a webapp and the Java API would serve me well there I
think - but it feels a bit wasteful to pull in all of elasticsearche's
classes plus all of the dependencies just to use
the org.elasticsearch.client.transport.TransportClient. Would it be possible
to have a client artifact with just the minimum dependencies? Or would you
advise on just creating my own client using the REST API? If I understand
correctly it means I lose the autodiscovery of the nodes in the cluster.

On a related note: Is it safe to reuse instances of the TransportClient?

Thanks for the great project

Jörg

P.S.: It would also be useful to have gradle generate and install a sources
artifact when running 'gradle elasticsearch:install' - but it seems there is
a bug in gradle 0.8 preventing that? (I'm not much of a gradle expert yet)

Hi,

I am happy that you like elasticsearch so far :). Regarding the client,
there is a different between the TransportClient and the Service#client(). I
think that you would like to use the Server#client() if you want want auto
discovery and "one hope" (for example, when indexing, directly go to the
node to index, and not to an arbitrary node which will redirect it to the
correct node). When using Server#client(), make sure you set the node.data
setting to "false" if you don't want that server to participate in the
allocation of shards (data).

Regarding the source files, yea, its kindda of a pain to do it with
gradle currently. Though I am currently leaning toward simply including the
source in the jar file. Its simple, clean and no extra place to look for
sources. The problem is that it means bigger jar file.

Dependencies: If you are using the Server#client(), then most of the
dependencies are required. This is because that server can potentially hold
data (so the lucene jars are required, jgroups for discovery, and so on). In
theory, the Transport client should only need the netty/joda/jackson jar
files, but I have not tested it... . Is there a reason that you are
concerned about the jar files? The benefits of running in the
Server#client() mode far out-weight that extra jar files, imo.

Any Client (TransportClient or Server#client) are built to be reused from
several threads. In fact, they would start to get pretty upset if not used
from several threads as they are probably not fully utilizing elasticsearch
(elasticsearch is highly concurrent). Note also the full async API that you
get with them.

-shay.banon

On Mon, Mar 22, 2010 at 4:37 PM, Joerg Erdmenger joerge@gmail.com wrote:

Hi everybody,

I'm experimeting with elasticsearch and I like it a lot so far.
I have a question concerning the packaging though: I'd like to use
elasticsearch from a webapp and the Java API would serve me well there I
think - but it feels a bit wasteful to pull in all of elasticsearche's
classes plus all of the dependencies just to use
the org.elasticsearch.client.transport.TransportClient. Would it be possible
to have a client artifact with just the minimum dependencies? Or would you
advise on just creating my own client using the REST API? If I understand
correctly it means I lose the autodiscovery of the nodes in the cluster.

On a related note: Is it safe to reuse instances of the TransportClient?

Thanks for the great project

Jörg

P.S.: It would also be useful to have gradle generate and install a sources
artifact when running 'gradle elasticsearch:install' - but it seems there is
a bug in gradle 0.8 preventing that? (I'm not much of a gradle expert yet)

Hi Shay,

thanks for that!

2010/3/22 Shay Banon shay.banon@elasticsearch.com

Hi,

I am happy that you like elasticsearch so far :). Regarding the client,
there is a different between the TransportClient and the Service#client(). I
think that you would like to use the Server#client() if you want want auto
discovery and "one hope" (for example, when indexing, directly go to the
node to index, and not to an arbitrary node which will redirect it to the
correct node). When using Server#client(), make sure you set the node.data
setting to "false" if you don't want that server to participate in the
allocation of shards (data).

Ah, ok. So from a 'performance' point of view that is better then?

Regarding the source files, yea, its kindda of a pain to do it with
gradle currently. Though I am currently leaning toward simply including the
source in the jar file. Its simple, clean and no extra place to look for
sources. The problem is that it means bigger jar file.

I wouldn't mind that.

Dependencies: If you are using the Server#client(), then most of the
dependencies are required. This is because that server can potentially hold
data (so the lucene jars are required, jgroups for discovery, and so on). In
theory, the Transport client should only need the netty/joda/jackson jar
files, but I have not tested it... . Is there a reason that you are
concerned about the jar files? The benefits of running in the
Server#client() mode far out-weight that extra jar files, imo.

Hmm, I'm not that concerned if they are all needed - but the way I
understood it I thought that I wasn't needing many of them and I'd like to
avoid carrying around lots and lots of jars that are indeed never needed.
Also, my Tomcat (when running it with Eclipse WTP) started doing funny
things when I added all these dependencies (spring context kept restarting,
got strange log setup errors, tomcat sent warnings about ThreadLocals not
being cleaned up). It doesn't do it when running outside of WTP (I guess
there must be some funny classloading business, but I need to investigate
that further)

Any Client (TransportClient or Server#client) are built to be reused
from several threads. In fact, they would start to get pretty upset if not
used from several threads as they are probably not fully utilizing
elasticsearch (elasticsearch is highly concurrent). Note also the full async
API that you get with them.

Ok.

Thanks

Jörg

-shay.banon

On Mon, Mar 22, 2010 at 4:37 PM, Joerg Erdmenger joerge@gmail.com wrote:

Hi everybody,

I'm experimeting with elasticsearch and I like it a lot so far.
I have a question concerning the packaging though: I'd like to use
elasticsearch from a webapp and the Java API would serve me well there I
think - but it feels a bit wasteful to pull in all of elasticsearche's
classes plus all of the dependencies just to use
the org.elasticsearch.client.transport.TransportClient. Would it be possible
to have a client artifact with just the minimum dependencies? Or would you
advise on just creating my own client using the REST API? If I understand
correctly it means I lose the autodiscovery of the nodes in the cluster.

On a related note: Is it safe to reuse instances of the TransportClient?

Thanks for the great project

Jörg

P.S.: It would also be useful to have gradle generate and install a
sources artifact when running 'gradle elasticsearch:install' - but it seems
there is a bug in gradle 0.8 preventing that? (I'm not much of a gradle
expert yet)

On Mon, Mar 22, 2010 at 5:59 PM, Joerg Erdmenger joerge@gmail.com wrote:

Hi Shay,

thanks for that!

2010/3/22 Shay Banon shay.banon@elasticsearch.com

Hi,

I am happy that you like elasticsearch so far :). Regarding the client,
there is a different between the TransportClient and the Service#client(). I
think that you would like to use the Server#client() if you want want auto
discovery and "one hope" (for example, when indexing, directly go to the
node to index, and not to an arbitrary node which will redirect it to the
correct node). When using Server#client(), make sure you set the node.data
setting to "false" if you don't want that server to participate in the
allocation of shards (data).

Ah, ok. So from a 'performance' point of view that is better then?

Regarding the source files, yea, its kindda of a pain to do it with
gradle currently. Though I am currently leaning toward simply including the
source in the jar file. Its simple, clean and no extra place to look for
sources. The problem is that it means bigger jar file.

I wouldn't mind that.

Dependencies: If you are using the Server#client(), then most of the
dependencies are required. This is because that server can potentially hold
data (so the lucene jars are required, jgroups for discovery, and so on). In
theory, the Transport client should only need the netty/joda/jackson jar
files, but I have not tested it... . Is there a reason that you are
concerned about the jar files? The benefits of running in the
Server#client() mode far out-weight that extra jar files, imo.

Hmm, I'm not that concerned if they are all needed - but the way I
understood it I thought that I wasn't needing many of them and I'd like to
avoid carrying around lots and lots of jars that are indeed never needed.
Also, my Tomcat (when running it with Eclipse WTP) started doing funny
things when I added all these dependencies (spring context kept restarting,
got strange log setup errors, tomcat sent warnings about ThreadLocals not
being cleaned up). It doesn't do it when running outside of WTP (I guess
there must be some funny classloading business, but I need to investigate
that further)

The thread locals ones not being cleaned up might relate to elasticsearch,
there are some static thread locals that I use in elasticsearch that are not
released (though they are weak referenced...)

Any Client (TransportClient or Server#client) are built to be reused
from several threads. In fact, they would start to get pretty upset if not
used from several threads as they are probably not fully utilizing
elasticsearch (elasticsearch is highly concurrent). Note also the full async
API that you get with them.

Ok.

Thanks

Jörg

-shay.banon

On Mon, Mar 22, 2010 at 4:37 PM, Joerg Erdmenger joerge@gmail.comwrote:

Hi everybody,

I'm experimeting with elasticsearch and I like it a lot so far.
I have a question concerning the packaging though: I'd like to use
elasticsearch from a webapp and the Java API would serve me well there I
think - but it feels a bit wasteful to pull in all of elasticsearche's
classes plus all of the dependencies just to use
the org.elasticsearch.client.transport.TransportClient. Would it be possible
to have a client artifact with just the minimum dependencies? Or would you
advise on just creating my own client using the REST API? If I understand
correctly it means I lose the autodiscovery of the nodes in the cluster.

On a related note: Is it safe to reuse instances of the TransportClient?

Thanks for the great project

Jörg

P.S.: It would also be useful to have gradle generate and install a
sources artifact when running 'gradle elasticsearch:install' - but it seems
there is a bug in gradle 0.8 preventing that? (I'm not much of a gradle
expert yet)

By the way, aside from the static thread local, do you call client#close()
and then (if started) server#close() when you undeploy the app? Can you post
what tomcat generates during shutdown (assuming close are called)?

-shay.banon

On Mon, Mar 22, 2010 at 9:09 PM, Shay Banon shay.banon@elasticsearch.comwrote:

On Mon, Mar 22, 2010 at 5:59 PM, Joerg Erdmenger joerge@gmail.com wrote:

Hi Shay,

thanks for that!

2010/3/22 Shay Banon shay.banon@elasticsearch.com

Hi,

I am happy that you like elasticsearch so far :). Regarding the
client, there is a different between the TransportClient and the
Service#client(). I think that you would like to use the Server#client() if
you want want auto discovery and "one hope" (for example, when indexing,
directly go to the node to index, and not to an arbitrary node which will
redirect it to the correct node). When using Server#client(), make sure you
set the node.data setting to "false" if you don't want that server to
participate in the allocation of shards (data).

Ah, ok. So from a 'performance' point of view that is better then?

Regarding the source files, yea, its kindda of a pain to do it with
gradle currently. Though I am currently leaning toward simply including the
source in the jar file. Its simple, clean and no extra place to look for
sources. The problem is that it means bigger jar file.

I wouldn't mind that.

Dependencies: If you are using the Server#client(), then most of the
dependencies are required. This is because that server can potentially hold
data (so the lucene jars are required, jgroups for discovery, and so on). In
theory, the Transport client should only need the netty/joda/jackson jar
files, but I have not tested it... . Is there a reason that you are
concerned about the jar files? The benefits of running in the
Server#client() mode far out-weight that extra jar files, imo.

Hmm, I'm not that concerned if they are all needed - but the way I
understood it I thought that I wasn't needing many of them and I'd like to
avoid carrying around lots and lots of jars that are indeed never needed.
Also, my Tomcat (when running it with Eclipse WTP) started doing funny
things when I added all these dependencies (spring context kept restarting,
got strange log setup errors, tomcat sent warnings about ThreadLocals not
being cleaned up). It doesn't do it when running outside of WTP (I guess
there must be some funny classloading business, but I need to investigate
that further)

The thread locals ones not being cleaned up might relate to elasticsearch,
there are some static thread locals that I use in elasticsearch that are not
released (though they are weak referenced...)

Any Client (TransportClient or Server#client) are built to be reused
from several threads. In fact, they would start to get pretty upset if not
used from several threads as they are probably not fully utilizing
elasticsearch (elasticsearch is highly concurrent). Note also the full async
API that you get with them.

Ok.

Thanks

Jörg

-shay.banon

On Mon, Mar 22, 2010 at 4:37 PM, Joerg Erdmenger joerge@gmail.comwrote:

Hi everybody,

I'm experimeting with elasticsearch and I like it a lot so far.
I have a question concerning the packaging though: I'd like to use
elasticsearch from a webapp and the Java API would serve me well there I
think - but it feels a bit wasteful to pull in all of elasticsearche's
classes plus all of the dependencies just to use
the org.elasticsearch.client.transport.TransportClient. Would it be possible
to have a client artifact with just the minimum dependencies? Or would you
advise on just creating my own client using the REST API? If I understand
correctly it means I lose the autodiscovery of the nodes in the cluster.

On a related note: Is it safe to reuse instances of the TransportClient?

Thanks for the great project

Jörg

P.S.: It would also be useful to have gradle generate and install a
sources artifact when running 'gradle elasticsearch:install' - but it seems
there is a bug in gradle 0.8 preventing that? (I'm not much of a gradle
expert yet)

Fixed the thread local leak (even static ones :slight_smile: ). They are cleaned when
you do Server#close or TransportClient#close.

-shay.banon

On Mon, Mar 22, 2010 at 10:35 PM, Shay Banon
shay.banon@elasticsearch.comwrote:

By the way, aside from the static thread local, do you call client#close()
and then (if started) server#close() when you undeploy the app? Can you post
what tomcat generates during shutdown (assuming close are called)?

-shay.banon

On Mon, Mar 22, 2010 at 9:09 PM, Shay Banon shay.banon@elasticsearch.comwrote:

On Mon, Mar 22, 2010 at 5:59 PM, Joerg Erdmenger joerge@gmail.comwrote:

Hi Shay,

thanks for that!

2010/3/22 Shay Banon shay.banon@elasticsearch.com

Hi,

I am happy that you like elasticsearch so far :). Regarding the
client, there is a different between the TransportClient and the
Service#client(). I think that you would like to use the Server#client() if
you want want auto discovery and "one hope" (for example, when indexing,
directly go to the node to index, and not to an arbitrary node which will
redirect it to the correct node). When using Server#client(), make sure you
set the node.data setting to "false" if you don't want that server to
participate in the allocation of shards (data).

Ah, ok. So from a 'performance' point of view that is better then?

Regarding the source files, yea, its kindda of a pain to do it with
gradle currently. Though I am currently leaning toward simply including the
source in the jar file. Its simple, clean and no extra place to look for
sources. The problem is that it means bigger jar file.

I wouldn't mind that.

Dependencies: If you are using the Server#client(), then most of the
dependencies are required. This is because that server can potentially hold
data (so the lucene jars are required, jgroups for discovery, and so on). In
theory, the Transport client should only need the netty/joda/jackson jar
files, but I have not tested it... . Is there a reason that you are
concerned about the jar files? The benefits of running in the
Server#client() mode far out-weight that extra jar files, imo.

Hmm, I'm not that concerned if they are all needed - but the way I
understood it I thought that I wasn't needing many of them and I'd like to
avoid carrying around lots and lots of jars that are indeed never needed.
Also, my Tomcat (when running it with Eclipse WTP) started doing funny
things when I added all these dependencies (spring context kept restarting,
got strange log setup errors, tomcat sent warnings about ThreadLocals not
being cleaned up). It doesn't do it when running outside of WTP (I guess
there must be some funny classloading business, but I need to investigate
that further)

The thread locals ones not being cleaned up might relate to elasticsearch,
there are some static thread locals that I use in elasticsearch that are not
released (though they are weak referenced...)

Any Client (TransportClient or Server#client) are built to be reused
from several threads. In fact, they would start to get pretty upset if not
used from several threads as they are probably not fully utilizing
elasticsearch (elasticsearch is highly concurrent). Note also the full async
API that you get with them.

Ok.

Thanks

Jörg

-shay.banon

On Mon, Mar 22, 2010 at 4:37 PM, Joerg Erdmenger joerge@gmail.comwrote:

Hi everybody,

I'm experimeting with elasticsearch and I like it a lot so far.
I have a question concerning the packaging though: I'd like to use
elasticsearch from a webapp and the Java API would serve me well there I
think - but it feels a bit wasteful to pull in all of elasticsearche's
classes plus all of the dependencies just to use
the org.elasticsearch.client.transport.TransportClient. Would it be possible
to have a client artifact with just the minimum dependencies? Or would you
advise on just creating my own client using the REST API? If I understand
correctly it means I lose the autodiscovery of the nodes in the cluster.

On a related note: Is it safe to reuse instances of the
TransportClient?

Thanks for the great project

Jörg

P.S.: It would also be useful to have gradle generate and install a
sources artifact when running 'gradle elasticsearch:install' - but it seems
there is a bug in gradle 0.8 preventing that? (I'm not much of a gradle
expert yet)

Thanks for the ongoing work.
But I think my problems had nothing to do with bugs in elasticsearch
actually. I still don't quite understand were the problem was but I fixed it
by excluding some elasticsearch logging dependencies that I have in my
project anyway - especially the log4j dependency was pulling in a jmxri
dependency from a dysfunctional java.net repository which seemed to cause
issues. As I say, I still don't quite understand what was going on but it
works now.

Jörg

P.S.: I was calling client#close on app shutdown