Best practice for java transport client

I'm rather new to Elasticsearch, and have just begun intergrating it with
an existing application.

The existing process runs in the background waiting for messages from a
message queue. In some cases, those messages are processed as either
indexing or percolation requests into ElasticSearch. So far, everything is
working fine in terms of the functionality. We are using the java
TransportClient to connect to ES. The first time we get a message to go to
ElasticSearch, we create that Client, and then re-use that same Client for
all other subsequent messages. The background process essentially runs
forever (although it does get periodically restarted for various reasons),
so the Client sticks around for as long as the process.

A couple of questions:

We noticed that once we initialize the client, we get a HUGE number of
TRACE messages in the application's log (these 2 messages repeat every 5
secs):

2014-12-22 18:01:18 TRACE ChildMemoryCircuitBreaker.addWithoutBreaking() -
[Dr. Lemuel Dorcas] [REQUEST] Adjusted breaker by [16440] bytes, now [16440]
2014-12-22 18:01:18 TRACE ChildMemoryCircuitBreaker.addWithoutBreaking() -
[Dr. Lemuel Dorcas] [REQUEST] Adjusted breaker by [-16440] bytes, now [0]

They just repeat forever. If I restart the entire process, they go away,
until a message triggers the creation of a new Client. I'm presuming that

  1. there is some way that I can turn off these messages through log4j and
  2. I can reduce the interval of these message by setting
    client.transport.nodes_sampler_interval. Can anyone confirm that this will
    indeed work? For the logging part, any hints on whether I need to do this
    in log4j or elsewhere would be appreciated, as it's painful setting up the
    environment for each experiment (and I spend way too much time hunting and
    pecking in log4j configs trying to turn off verbose logging of 3rd-party
    apps, just google apache httpclient logging issues to see what I mean!).

But that also leads me to ask the next question, as I have some concerns
that the once created, the Client isn't quietly sitting in the background
waiting to be called upon by the main process, but is either sending or
receiving ping messages (to itself? to the remote ES cluster?).

So now I wonder if keeping the Client alive over a long period of time the
best practice, since it seems to be generating quite a bit of extra traffic
or using CPU cycles for not much benefit? Or should I just close the
client when we're done processing a message and re-open a new Client when
needed for a subsequent message (not all messages will need ES)?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/598245d1-47e9-4ea6-98ab-0c2790e4c4f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

  1. The messages are harmless. You should not set log level to TRACE. Check
    config/logging.yml, there you can set the global log level to INFO.

  2. client.transport.nodes_sampler_interval is a setting for the interval
    between pings where the client checks for nodes being still alive. It has
    nothing to do with the circuit breaker which generates diagnostic messages.

Once up, you should let the client instance run being active all the time.
The client automatically pings connected node each 5 seconds to get the
newest information about the cluster, including nodes that went down/up.
This is very beneficial in case of node failures or cluster maintenance
because the client can drop broken connections and switch over to new nodes
automatically.

Jörg

On Tue, Dec 23, 2014 at 5:34 PM, Elaine Cario etcario@gmail.com wrote:

I'm rather new to Elasticsearch, and have just begun intergrating it with
an existing application.

The existing process runs in the background waiting for messages from a
message queue. In some cases, those messages are processed as either
indexing or percolation requests into Elasticsearch. So far, everything is
working fine in terms of the functionality. We are using the java
TransportClient to connect to ES. The first time we get a message to go to
Elasticsearch, we create that Client, and then re-use that same Client for
all other subsequent messages. The background process essentially runs
forever (although it does get periodically restarted for various reasons),
so the Client sticks around for as long as the process.

A couple of questions:

We noticed that once we initialize the client, we get a HUGE number of
TRACE messages in the application's log (these 2 messages repeat every 5
secs):

2014-12-22 18:01:18 TRACE ChildMemoryCircuitBreaker.addWithoutBreaking() -
[Dr. Lemuel Dorcas] [REQUEST] Adjusted breaker by [16440] bytes, now [16440]
2014-12-22 18:01:18 TRACE ChildMemoryCircuitBreaker.addWithoutBreaking() -
[Dr. Lemuel Dorcas] [REQUEST] Adjusted breaker by [-16440] bytes, now [0]

They just repeat forever. If I restart the entire process, they go away,
until a message triggers the creation of a new Client. I'm presuming that

  1. there is some way that I can turn off these messages through log4j and
  2. I can reduce the interval of these message by setting
    client.transport.nodes_sampler_interval. Can anyone confirm that this will
    indeed work? For the logging part, any hints on whether I need to do this
    in log4j or elsewhere would be appreciated, as it's painful setting up the
    environment for each experiment (and I spend way too much time hunting and
    pecking in log4j configs trying to turn off verbose logging of 3rd-party
    apps, just google apache httpclient logging issues to see what I mean!).

But that also leads me to ask the next question, as I have some concerns
that the once created, the Client isn't quietly sitting in the background
waiting to be called upon by the main process, but is either sending or
receiving ping messages (to itself? to the remote ES cluster?).

So now I wonder if keeping the Client alive over a long period of time the
best practice, since it seems to be generating quite a bit of extra traffic
or using CPU cycles for not much benefit? Or should I just close the
client when we're done processing a message and re-open a new Client when
needed for a subsequent message (not all messages will need ES)?

Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/598245d1-47e9-4ea6-98ab-0c2790e4c4f1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/598245d1-47e9-4ea6-98ab-0c2790e4c4f1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoESo66TUoaoGYr-_aoLg0CUN6VbJY8re%2BqA4sAyyPu%2BpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I answered a couple of questions myself after reproducing it on a local
system: modify log4j config for com.elasticsearch to ERROR turns the
messages off, and indeed setting client.transport.nodes_sample_interval to
something larger than 5s reduced the chatter. I'd still like to understand
though what is happening in the background (i.e. is there also unlogged
communication going during idle moments?)

On Tuesday, December 23, 2014 11:34:47 AM UTC-5, Elaine Cario wrote:

I'm rather new to Elasticsearch, and have just begun intergrating it with
an existing application.

The existing process runs in the background waiting for messages from a
message queue. In some cases, those messages are processed as either
indexing or percolation requests into Elasticsearch. So far, everything is
working fine in terms of the functionality. We are using the java
TransportClient to connect to ES. The first time we get a message to go to
Elasticsearch, we create that Client, and then re-use that same Client for
all other subsequent messages. The background process essentially runs
forever (although it does get periodically restarted for various reasons),
so the Client sticks around for as long as the process.

A couple of questions:

We noticed that once we initialize the client, we get a HUGE number of
TRACE messages in the application's log (these 2 messages repeat every 5
secs):

2014-12-22 18:01:18 TRACE ChildMemoryCircuitBreaker.addWithoutBreaking() -
[Dr. Lemuel Dorcas] [REQUEST] Adjusted breaker by [16440] bytes, now [16440]
2014-12-22 18:01:18 TRACE ChildMemoryCircuitBreaker.addWithoutBreaking() -
[Dr. Lemuel Dorcas] [REQUEST] Adjusted breaker by [-16440] bytes, now [0]

They just repeat forever. If I restart the entire process, they go away,
until a message triggers the creation of a new Client. I'm presuming that

  1. there is some way that I can turn off these messages through log4j and
  2. I can reduce the interval of these message by setting
    client.transport.nodes_sampler_interval. Can anyone confirm that this will
    indeed work? For the logging part, any hints on whether I need to do this
    in log4j or elsewhere would be appreciated, as it's painful setting up the
    environment for each experiment (and I spend way too much time hunting and
    pecking in log4j configs trying to turn off verbose logging of 3rd-party
    apps, just google apache httpclient logging issues to see what I mean!).

But that also leads me to ask the next question, as I have some concerns
that the once created, the Client isn't quietly sitting in the background
waiting to be called upon by the main process, but is either sending or
receiving ping messages (to itself? to the remote ES cluster?).

So now I wonder if keeping the Client alive over a long period of time the
best practice, since it seems to be generating quite a bit of extra traffic
or using CPU cycles for not much benefit? Or should I just close the
client when we're done processing a message and re-open a new Client when
needed for a subsequent message (not all messages will need ES)?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2a450277-fbca-4e40-a354-39cda58f4987%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

As said, this is not chatter, it is definitely a feature, for fault
tolerance.

What do you mean by "unlogged communication"? ES does not log node
communication. For this, you will need tools like query profiling
Feature: Add ability to profile queries by polyfractal · Pull Request #6699 · elastic/elasticsearch · GitHub or the upcoming
Shield product which offers audit trail capability.

Maybe you have noticed that all nodes (except TransportClients) are always
receiving updates to the cluster state from the master node, otherwise,
they won't be able to continue their work.

Jörg

On Tue, Dec 23, 2014 at 6:33 PM, Elaine Cario etcario@gmail.com wrote:

I answered a couple of questions myself after reproducing it on a local
system: modify log4j config for com.elasticsearch to ERROR turns the
messages off, and indeed setting client.transport.nodes_sample_interval to
something larger than 5s reduced the chatter. I'd still like to understand
though what is happening in the background (i.e. is there also unlogged
communication going during idle moments?)

On Tuesday, December 23, 2014 11:34:47 AM UTC-5, Elaine Cario wrote:

I'm rather new to Elasticsearch, and have just begun intergrating it with
an existing application.

The existing process runs in the background waiting for messages from a
message queue. In some cases, those messages are processed as either
indexing or percolation requests into Elasticsearch. So far, everything is
working fine in terms of the functionality. We are using the java
TransportClient to connect to ES. The first time we get a message to go to
Elasticsearch, we create that Client, and then re-use that same Client for
all other subsequent messages. The background process essentially runs
forever (although it does get periodically restarted for various reasons),
so the Client sticks around for as long as the process.

A couple of questions:

We noticed that once we initialize the client, we get a HUGE number of
TRACE messages in the application's log (these 2 messages repeat every 5
secs):

2014-12-22 18:01:18 TRACE ChildMemoryCircuitBreaker.addWithoutBreaking()

  • [Dr. Lemuel Dorcas] [REQUEST] Adjusted breaker by [16440] bytes, now
    [16440]
    2014-12-22 18:01:18 TRACE ChildMemoryCircuitBreaker.addWithoutBreaking()
  • [Dr. Lemuel Dorcas] [REQUEST] Adjusted breaker by [-16440] bytes, now [0]

They just repeat forever. If I restart the entire process, they go away,
until a message triggers the creation of a new Client. I'm presuming that

  1. there is some way that I can turn off these messages through log4j and
  2. I can reduce the interval of these message by setting
    client.transport.nodes_sampler_interval. Can anyone confirm that this
    will indeed work? For the logging part, any hints on whether I need to do
    this in log4j or elsewhere would be appreciated, as it's painful setting up
    the environment for each experiment (and I spend way too much time hunting
    and pecking in log4j configs trying to turn off verbose logging of
    3rd-party apps, just google apache httpclient logging issues to see what I
    mean!).

But that also leads me to ask the next question, as I have some concerns
that the once created, the Client isn't quietly sitting in the background
waiting to be called upon by the main process, but is either sending or
receiving ping messages (to itself? to the remote ES cluster?).

So now I wonder if keeping the Client alive over a long period of time
the best practice, since it seems to be generating quite a bit of extra
traffic or using CPU cycles for not much benefit? Or should I just close
the client when we're done processing a message and re-open a new Client
when needed for a subsequent message (not all messages will need ES)?

Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2a450277-fbca-4e40-a354-39cda58f4987%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2a450277-fbca-4e40-a354-39cda58f4987%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEKuJXCATKZKDrXvuFxw%2BRNtH6gSszGYt_sUqG%3D8AZwDw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.