TransportClient Throws 'java.lang.OutOfMemoryError: GC overhead limit exceeded' when all nodes in cluster are down (1.1.1)


(Santiago Ferrer Deheza) #1

Hi there!

I'm having this exception ('java.lang.OutOfMemoryError: GC overhead limit
exceeded'
) in client when my ES 1.1.1 cluster goes down. Im having
problems with the cluster (work in progress) but it doesn't seem right that
the client server throws OutOfMemoryError.

Client Spcs:

  • Java 6u32
  • Ubuntu 12.04 LTS
  • Elasticsearch 1.1.1 Jar

The client is only use for searching. Any clue? If more information is need
just let me know.

Thanks,
Santi!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b34378cf-8f43-4f65-b8a6-e6f649150e67%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

Most likely you have memory leaks in your app and your client memory was
exhausted.

If you can show the client code how you submit queries and process
responses and the stack traces you receive, more help could be possible to
offer.

A general hint is to switch to Java 7.

Jörg

On Mon, Jun 23, 2014 at 8:14 PM, Santiago Ferrer Deheza <
sa.ferrer.deheza@gmail.com> wrote:

Hi there!

I'm having this exception ('java.lang.OutOfMemoryError: GC overhead
limit exceeded'
) in client when my ES 1.1.1 cluster goes down. Im
having problems with the cluster (work in progress) but it doesn't seem
right that the client server throws OutOfMemoryError.

Client Spcs:

  • Java 6u32
  • Ubuntu 12.04 LTS
  • Elasticsearch 1.1.1 Jar

The client is only use for searching. Any clue? If more information is
need just let me know.

Thanks,
Santi!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b34378cf-8f43-4f65-b8a6-e6f649150e67%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b34378cf-8f43-4f65-b8a6-e6f649150e67%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEF-SdPeY0c_3_J%3Dyvq%2BoxsjND5LuDAzoiknVMVwKKX4g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Santiago Ferrer Deheza) #3

The rare thing is that happends when the cluster status is red (thats why i
think the client is the problem).

This is my code

ElasticResponse response;

try{

if(category != null && !category.isEmpty()){

SearchRequestBuilder searchQuery = client.prepareSearch(ConfigFactory.load
().getString("elasticsearch.updater.index"))

.setSearchType(SearchType.QUERY_THEN_FETCH).setSize(numberOfAds);

BoolQueryBuilder qb = QueryBuilders

.boolQuery();

FunctionScoreQueryBuilder functionQueryBuilder = createFunctionScore(
external, qb);

List<Map<String, Object>> ads = new ArrayList<Map<String, Object>>();

String categPath = Category.getCategoryIdPath(category);

Deque categories = new LinkedList<ElasticSearch.
SearchCategory>();

for(String category : categPath.split("_")){

categories.addFirst(new SearchCategory(category,false, categPath));

}

functionQueryBuilder = functionQueryBuilder.boostMode(CombineFunction.MULT
);

functionQueryBuilder = functionQueryBuilder.scoreMode("max");

fillWithFunctions(functionQueryBuilder,categories,INITIAL_BOOST);

SearchResponse searchResponse = searchQuery.setTypes(categPath.split("_")[
0]).setQuery(functionQueryBuilder).execute().actionGet(50, TimeUnit.
MILLISECONDS);

SearchHits hits = searchResponse.getHits();

Iterator it = hits.iterator();

int count = 0;

while(it.hasNext() && count < numberOfAds){

Map<String, Object> sourceAsMap = it.next().sourceAsMap();

ads.add(sourceAsMap);

count++;

}

if(ads.isEmpty()){

Logger.info("ES - No ads found. Category: " + category);

response = new ElasticResponse(NO_CONTENT);

}else{

response = new ElasticResponse(ads,OK, position);

}

}else{

Logger.info("ES - No category sent.");

response = new ElasticResponse(BAD_REQUEST);

}

}catch(ElasticsearchTimeoutException e){

Logger.info("ES - Timeout.", e);

MetricsManager.getElasticsearchMetrics().incrementTimeoutCounter();

response = new ElasticResponse(REQUEST_TIMEOUT);

}catch(Exception e){

Logger.error("Searching in elasticsearch",e);

response = new ElasticResponse(INTERNAL_SERVER_ERROR);

}

In the code you can see we define a timeout when executing actionGet.
This works fine when the cluster is OK (we have a limited SLA) but when the
ES Cluster goes down it doesn't take it into count, raising our SLA.

Thanks!

On Monday, June 23, 2014 4:07:06 PM UTC-3, Jörg Prante wrote:

Most likely you have memory leaks in your app and your client memory was
exhausted.

If you can show the client code how you submit queries and process
responses and the stack traces you receive, more help could be possible to
offer.

A general hint is to switch to Java 7.

Jörg

On Mon, Jun 23, 2014 at 8:14 PM, Santiago Ferrer Deheza <
sa.ferre...@gmail.com <javascript:>> wrote:

Hi there!

I'm having this exception ('java.lang.OutOfMemoryError: GC overhead
limit exceeded'
) in client when my ES 1.1.1 cluster goes down. Im
having problems with the cluster (work in progress) but it doesn't seem
right that the client server throws OutOfMemoryError.

Client Spcs:

  • Java 6u32
  • Ubuntu 12.04 LTS
  • Elasticsearch 1.1.1 Jar

The client is only use for searching. Any clue? If more information is
need just let me know.

Thanks,
Santi!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b34378cf-8f43-4f65-b8a6-e6f649150e67%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b34378cf-8f43-4f65-b8a6-e6f649150e67%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b59e9c2a-77d3-4d52-a330-d9d20a77e6bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #4

Maybe it is not OOM but running out of file descriptors, that can only be
seen in the stack trace.

TransportClient, by default, tries to reconnect quite aggressively, so if
you could monitor the number of open network ports while you get OOM this
would be helpful for analysis. Maybe you have sniff mode on (which is the
default) and TransportClient retries are consuming all ports in despair...
you may switch log level to debug to find out what TransportClient is doing.

To remedy the cluster red situation a little bit you can increase the
timeout from 5s to say 30s (this will not solve the cause of the problem of
course, it delays the OOM). And you should take attention of the number of
responding shards in search responses, so you can bail out with fatal error
when the number of responding shards is too low. Of course you could also
check cluster health color once a minute or so.

Jörg

On Mon, Jun 23, 2014 at 10:29 PM, Santiago Ferrer Deheza <
sa.ferrer.deheza@gmail.com> wrote:

The rare thing is that happends when the cluster status is red (thats why
i think the client is the problem).

This is my code

ElasticResponse response;

try{

if(category != null && !category.isEmpty()){

SearchRequestBuilder searchQuery = client.prepareSearch(ConfigFactory.
load().getString("elasticsearch.updater.index"))

.setSearchType(SearchType.QUERY_THEN_FETCH).setSize(numberOfAds);

BoolQueryBuilder qb = QueryBuilders

.boolQuery();

FunctionScoreQueryBuilder functionQueryBuilder = createFunctionScore(
external, qb);

List<Map<String, Object>> ads = new ArrayList<Map<String, Object>>();

String categPath = Category.getCategoryIdPath(category);

Deque categories = new LinkedList<ElasticSearch.
SearchCategory>();

for(String category : categPath.split("_")){

categories.addFirst(new SearchCategory(category,false, categPath));

}

functionQueryBuilder = functionQueryBuilder.boostMode(CombineFunction.
MULT);

functionQueryBuilder = functionQueryBuilder.scoreMode("max");

fillWithFunctions(functionQueryBuilder,categories,INITIAL_BOOST);

SearchResponse searchResponse = searchQuery.setTypes(categPath.split("_"
)[0]).setQuery(functionQueryBuilder).execute().actionGet(50, TimeUnit.
MILLISECONDS);

SearchHits hits = searchResponse.getHits();

Iterator it = hits.iterator();

int count = 0;

while(it.hasNext() && count < numberOfAds){

Map<String, Object> sourceAsMap = it.next().sourceAsMap();

ads.add(sourceAsMap);

count++;

}

if(ads.isEmpty()){

Logger.info("ES - No ads found. Category: " + category);

response = new ElasticResponse(NO_CONTENT);

}else{

response = new ElasticResponse(ads,OK, position);

}

}else{

Logger.info("ES - No category sent.");

response = new ElasticResponse(BAD_REQUEST);

}

}catch(ElasticsearchTimeoutException e){

Logger.info("ES - Timeout.", e);

MetricsManager.getElasticsearchMetrics().incrementTimeoutCounter();

response = new ElasticResponse(REQUEST_TIMEOUT);

}catch(Exception e){

Logger.error("Searching in elasticsearch",e);

response = new ElasticResponse(INTERNAL_SERVER_ERROR);

}

In the code you can see we define a timeout when executing actionGet.
This works fine when the cluster is OK (we have a limited SLA) but when the
ES Cluster goes down it doesn't take it into count, raising our SLA.

Thanks!

On Monday, June 23, 2014 4:07:06 PM UTC-3, Jörg Prante wrote:

Most likely you have memory leaks in your app and your client memory was
exhausted.

If you can show the client code how you submit queries and process
responses and the stack traces you receive, more help could be possible to
offer.

A general hint is to switch to Java 7.

Jörg

On Mon, Jun 23, 2014 at 8:14 PM, Santiago Ferrer Deheza <
sa.ferre...@gmail.com> wrote:

Hi there!

I'm having this exception ('java.lang.OutOfMemoryError: GC overhead
limit exceeded'
) in client when my ES 1.1.1 cluster goes down. Im
having problems with the cluster (work in progress) but it doesn't seem
right that the client server throws OutOfMemoryError.

Client Spcs:

  • Java 6u32
  • Ubuntu 12.04 LTS
  • Elasticsearch 1.1.1 Jar

The client is only use for searching. Any clue? If more information is
need just let me know.

Thanks,
Santi!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/b34378cf-8f43-4f65-b8a6-e6f649150e67%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b34378cf-8f43-4f65-b8a6-e6f649150e67%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b59e9c2a-77d3-4d52-a330-d9d20a77e6bb%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b59e9c2a-77d3-4d52-a330-d9d20a77e6bb%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdNPJtHhuEWV9T2mCnNcVmESapVNRqcW35JfgnYRiJCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Santiago Ferrer Deheza) #5

Thanks Jorg for the answer! I will turn on logging and came back with
stacktrace!

On Monday, June 23, 2014 7:13:06 PM UTC-3, Jörg Prante wrote:

Maybe it is not OOM but running out of file descriptors, that can only be
seen in the stack trace.

TransportClient, by default, tries to reconnect quite aggressively, so if
you could monitor the number of open network ports while you get OOM this
would be helpful for analysis. Maybe you have sniff mode on (which is the
default) and TransportClient retries are consuming all ports in despair...
you may switch log level to debug to find out what TransportClient is doing.

To remedy the cluster red situation a little bit you can increase the
timeout from 5s to say 30s (this will not solve the cause of the problem of
course, it delays the OOM). And you should take attention of the number of
responding shards in search responses, so you can bail out with fatal error
when the number of responding shards is too low. Of course you could also
check cluster health color once a minute or so.

Jörg

On Mon, Jun 23, 2014 at 10:29 PM, Santiago Ferrer Deheza <
sa.ferre...@gmail.com <javascript:>> wrote:

The rare thing is that happends when the cluster status is red (thats why
i think the client is the problem).

This is my code

ElasticResponse response;

try{

if(category != null && !category.isEmpty()){

SearchRequestBuilder searchQuery = client.prepareSearch(ConfigFactory.
load().getString("elasticsearch.updater.index"))

.setSearchType(SearchType.QUERY_THEN_FETCH).setSize(numberOfAds);

BoolQueryBuilder qb = QueryBuilders

.boolQuery();

FunctionScoreQueryBuilder functionQueryBuilder = createFunctionScore(
external, qb);

List<Map<String, Object>> ads = new ArrayList<Map<String, Object>>();

String categPath = Category.getCategoryIdPath(category);

Deque categories = new LinkedList<ElasticSearch.
SearchCategory>();

for(String category : categPath.split("_")){

categories.addFirst(new SearchCategory(category,false, categPath));

}

functionQueryBuilder = functionQueryBuilder.boostMode(CombineFunction.
MULT);

functionQueryBuilder = functionQueryBuilder.scoreMode("max");

fillWithFunctions(functionQueryBuilder,categories,INITIAL_BOOST);

SearchResponse searchResponse = searchQuery.setTypes(categPath.split(
"_")[0]).setQuery(functionQueryBuilder).execute().actionGet(50, TimeUnit.
MILLISECONDS);

SearchHits hits = searchResponse.getHits();

Iterator it = hits.iterator();

int count = 0;

while(it.hasNext() && count < numberOfAds){

Map<String, Object> sourceAsMap = it.next().sourceAsMap();

ads.add(sourceAsMap);

count++;

}

if(ads.isEmpty()){

Logger.info("ES - No ads found. Category: " + category);

response = new ElasticResponse(NO_CONTENT);

}else{

response = new ElasticResponse(ads,OK, position);

}

}else{

Logger.info("ES - No category sent.");

response = new ElasticResponse(BAD_REQUEST);

}

}catch(ElasticsearchTimeoutException e){

Logger.info("ES - Timeout.", e);

MetricsManager.getElasticsearchMetrics().incrementTimeoutCounter();

response = new ElasticResponse(REQUEST_TIMEOUT);

}catch(Exception e){

Logger.error("Searching in elasticsearch",e);

response = new ElasticResponse(INTERNAL_SERVER_ERROR);

}

In the code you can see we define a timeout when executing actionGet.
This works fine when the cluster is OK (we have a limited SLA) but when the
ES Cluster goes down it doesn't take it into count, raising our SLA.

Thanks!

On Monday, June 23, 2014 4:07:06 PM UTC-3, Jörg Prante wrote:

Most likely you have memory leaks in your app and your client memory was
exhausted.

If you can show the client code how you submit queries and process
responses and the stack traces you receive, more help could be possible to
offer.

A general hint is to switch to Java 7.

Jörg

On Mon, Jun 23, 2014 at 8:14 PM, Santiago Ferrer Deheza <
sa.ferre...@gmail.com> wrote:

Hi there!

I'm having this exception ('java.lang.OutOfMemoryError: GC overhead
limit exceeded'
) in client when my ES 1.1.1 cluster goes down. Im
having problems with the cluster (work in progress) but it doesn't seem
right that the client server throws OutOfMemoryError.

Client Spcs:

  • Java 6u32
  • Ubuntu 12.04 LTS
  • Elasticsearch 1.1.1 Jar

The client is only use for searching. Any clue? If more information is
need just let me know.

Thanks,
Santi!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/b34378cf-8f43-4f65-b8a6-e6f649150e67%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b34378cf-8f43-4f65-b8a6-e6f649150e67%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b59e9c2a-77d3-4d52-a330-d9d20a77e6bb%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b59e9c2a-77d3-4d52-a330-d9d20a77e6bb%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eed35306-ebd9-4773-a838-ab8c18526c6f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6