Loop on startup doing initial recovery

rrva · October 2, 2011, 1:07pm

Hello list,

I am using ElasticSearch embedded in a spring webapp with couchdb-
river. Recently, ElasticSearch has started looping on startup when
doing recovery and finally getting out of memory. Only occurs in a
dev. environment (haven't been able to isolate why, the only obvious
difference is Linux vs. Mac OSX and the couchdb contents.)
Elasticsearch works in the same app with identical config on another
machine and does not loop. I tried to point elasticsearch to a clean
path.home directory but it did not change the behavior.

Here is a log excerpt. What could cause this looping? Haven't been
debugging ElasticSearch itself yet.

14:28:05.988 INFO s.f.search.ElasticSearchServer - Starting the
Elastic Search server node
14:28:06.087 INFO org.elasticsearch.node - [Astrid Bloom]
{elasticsearch/0.17.7}[473]: initializing ...
14:28:06.116 INFO org.elasticsearch.plugins - [Astrid Bloom] loaded
[river-couchdb], sites []
14:28:07.761 DEBUG org.elasticsearch.cache.memory - [Astrid Bloom]
using bytebuffer cache with small_buffer_size [1kb], large_buffer_size
[1mb], small_cache_size [10mb], large_cache_size [500mb], direct
[true]
14:28:07.782 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [cached], type [cached], keep_alive [30s]
14:28:07.782 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [index], type [cached], keep_alive [5m]
14:28:07.783 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [search], type [cached], keep_alive [5m]
14:28:07.783 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [percolate], type [cached], keep_alive [5m]
14:28:07.783 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [management], type [scaling], min [1], size [20],
keep_alive [5m]
14:28:07.787 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [merge], type [scaling], min [1], size [20],
keep_alive [5m]
14:28:07.787 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [snapshot], type [scaling], min [1], size [10],
keep_alive [5m]
14:28:07.801 DEBUG org.elasticsearch.transport.netty - [Astrid Bloom]
using worker_count[4], port[9300-9400], bind_host[null],
publish_host[null], compress[false], connect_timeout[30s],
connections_per_node[2/4/1]
14:28:07.853 DEBUG o.e.discovery.zen.ping.multicast - [Astrid Bloom]
using group [224.2.2.4], with port [54328], ttl [3], and address
[null]
14:28:07.865 DEBUG o.e.discovery.zen.ping.unicast - [Astrid Bloom]
using initial hosts [], with concurrent_connects [10]
14:28:07.866 DEBUG org.elasticsearch.discovery.zen - [Astrid Bloom]
using ping.timeout [3s]
14:28:07.868 DEBUG org.elasticsearch.discovery.zen.fd - [Astrid Bloom]
[master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
14:28:07.888 DEBUG org.elasticsearch.discovery.zen.fd - [Astrid Bloom]
[node ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
14:28:08.315 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
14:28:08.315 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [allow_rebalance] with [indices_all_active]
14:28:08.316 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [cluster_concurrent_rebalance] with [2]
14:28:08.346 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
14:28:08.347 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [allow_rebalance] with [indices_all_active]
14:28:08.347 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [cluster_concurrent_rebalance] with [2]
14:28:08.372 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
14:28:08.372 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [allow_rebalance] with [indices_all_active]
14:28:08.372 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [cluster_concurrent_rebalance] with [2]
14:28:08.395 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]

....

Repeats until OOM.

kimchy · October 3, 2011, 9:39am

Do you have a special type of config set? Maybe there is a wrong
configuration somewhere? The library used for dependency injection (guice)
gets quirky when it comes to failed to configuration and tries to create
objects several times..., been meaning to fix that... . Can you try with no
special config?

On Sun, Oct 2, 2011 at 3:07 PM, Ragnar Rova ragnar.rova@gmail.com wrote:

Hello list,

I am using Elasticsearch embedded in a spring webapp with couchdb-
river. Recently, Elasticsearch has started looping on startup when
doing recovery and finally getting out of memory. Only occurs in a
dev. environment (haven't been able to isolate why, the only obvious
difference is Linux vs. Mac OSX and the couchdb contents.)
Elasticsearch works in the same app with identical config on another
machine and does not loop. I tried to point elasticsearch to a clean
path.home directory but it did not change the behavior.

Here is a log excerpt. What could cause this looping? Haven't been
debugging Elasticsearch itself yet.

14:28:05.988 INFO s.f.search.ElasticSearchServer - Starting the
Elastic Search server node
14:28:06.087 INFO org.elasticsearch.node - [Astrid Bloom]
{elasticsearch/0.17.7}[473]: initializing ...
14:28:06.116 INFO org.elasticsearch.plugins - [Astrid Bloom] loaded
[river-couchdb], sites
14:28:07.761 DEBUG org.elasticsearch.cache.memory - [Astrid Bloom]
using bytebuffer cache with small_buffer_size [1kb], large_buffer_size
[1mb], small_cache_size [10mb], large_cache_size [500mb], direct
[true]
14:28:07.782 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [cached], type [cached], keep_alive [30s]
14:28:07.782 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [index], type [cached], keep_alive [5m]
14:28:07.783 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [search], type [cached], keep_alive [5m]
14:28:07.783 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [percolate], type [cached], keep_alive [5m]
14:28:07.783 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [management], type [scaling], min [1], size [20],
keep_alive [5m]
14:28:07.787 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [merge], type [scaling], min [1], size [20],
keep_alive [5m]
14:28:07.787 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [snapshot], type [scaling], min [1], size [10],
keep_alive [5m]
14:28:07.801 DEBUG org.elasticsearch.transport.netty - [Astrid Bloom]
using worker_count[4], port[9300-9400], bind_host[null],
publish_host[null], compress[false], connect_timeout[30s],
connections_per_node[2/4/1]
14:28:07.853 DEBUG o.e.discovery.zen.ping.multicast - [Astrid Bloom]
using group [224.2.2.4], with port [54328], ttl [3], and address
[null]
14:28:07.865 DEBUG o.e.discovery.zen.ping.unicast - [Astrid Bloom]
using initial hosts , with concurrent_connects [10]
14:28:07.866 DEBUG org.elasticsearch.discovery.zen - [Astrid Bloom]
using ping.timeout [3s]
14:28:07.868 DEBUG org.elasticsearch.discovery.zen.fd - [Astrid Bloom]
[master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
14:28:07.888 DEBUG org.elasticsearch.discovery.zen.fd - [Astrid Bloom]
[node ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
14:28:08.315 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
14:28:08.315 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [allow_rebalance] with [indices_all_active]
14:28:08.316 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [cluster_concurrent_rebalance] with [2]
14:28:08.346 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
14:28:08.347 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [allow_rebalance] with [indices_all_active]
14:28:08.347 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [cluster_concurrent_rebalance] with [2]
14:28:08.372 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
14:28:08.372 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [allow_rebalance] with [indices_all_active]
14:28:08.372 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [cluster_concurrent_rebalance] with [2]
14:28:08.395 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]

....

Repeats until OOM.

rrva · October 6, 2011, 8:48pm

I had identical config set in both working and non-working envs.

cluster.name=myname
path.home=../../elasticsearch

The search server was embedded using this:

gist.github.com

https://gist.github.com/oravecz/977580

ElasticSearchServer.java

package my.elasticsearch;

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;

This file has been truncated. show original

On Oct 3, 11:39 am, Shay Banon kim...@gmail.com wrote:

Do you have a special type of config set? Maybe there is a wrong
configuration somewhere? The library used for dependency injection (guice)
gets quirky when it comes to failed to configuration and tries to create
objects several times..., been meaning to fix that... . Can you try with no
special config?

On Sun, Oct 2, 2011 at 3:07 PM, Ragnar Rova ragnar.r...@gmail.com wrote:

Hello list,

I am using Elasticsearch embedded in a spring webapp with couchdb-
river. Recently, Elasticsearch has started looping on startup when
doing recovery and finally getting out of memory. Only occurs in a
dev. environment (haven't been able to isolate why, the only obvious
difference is Linux vs. Mac OSX and the couchdb contents.)
Elasticsearch works in the same app with identical config on another
machine and does not loop. I tried to point elasticsearch to a clean
path.home directory but it did not change the behavior.

Here is a log excerpt. What could cause this looping? Haven't been
debugging Elasticsearch itself yet.

14:28:05.988 INFO s.f.search.ElasticSearchServer - Starting the
Elastic Search server node
14:28:06.087 INFO org.elasticsearch.node - [Astrid Bloom]
{elasticsearch/0.17.7}[473]: initializing ...
14:28:06.116 INFO org.elasticsearch.plugins - [Astrid Bloom] loaded
[river-couchdb], sites
14:28:07.761 DEBUG org.elasticsearch.cache.memory - [Astrid Bloom]
using bytebuffer cache with small_buffer_size [1kb], large_buffer_size
[1mb], small_cache_size [10mb], large_cache_size [500mb], direct
[true]
14:28:07.782 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [cached], type [cached], keep_alive [30s]
14:28:07.782 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [index], type [cached], keep_alive [5m]
14:28:07.783 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [search], type [cached], keep_alive [5m]
14:28:07.783 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [percolate], type [cached], keep_alive [5m]
14:28:07.783 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [management], type [scaling], min [1], size [20],
keep_alive [5m]
14:28:07.787 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [merge], type [scaling], min [1], size [20],
keep_alive [5m]
14:28:07.787 DEBUG org.elasticsearch.threadpool - [Astrid Bloom]
creating thread_pool [snapshot], type [scaling], min [1], size [10],
keep_alive [5m]
14:28:07.801 DEBUG org.elasticsearch.transport.netty - [Astrid Bloom]
using worker_count[4], port[9300-9400], bind_host[null],
publish_host[null], compress[false], connect_timeout[30s],
connections_per_node[2/4/1]
14:28:07.853 DEBUG o.e.discovery.zen.ping.multicast - [Astrid Bloom]
using group [224.2.2.4], with port [54328], ttl [3], and address
[null]
14:28:07.865 DEBUG o.e.discovery.zen.ping.unicast - [Astrid Bloom]
using initial hosts , with concurrent_connects [10]
14:28:07.866 DEBUG org.elasticsearch.discovery.zen - [Astrid Bloom]
using ping.timeout [3s]
14:28:07.868 DEBUG org.elasticsearch.discovery.zen.fd - [Astrid Bloom]
[master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
14:28:07.888 DEBUG org.elasticsearch.discovery.zen.fd - [Astrid Bloom]
[node ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
14:28:08.315 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
14:28:08.315 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [allow_rebalance] with [indices_all_active]
14:28:08.316 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [cluster_concurrent_rebalance] with [2]
14:28:08.346 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
14:28:08.347 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [allow_rebalance] with [indices_all_active]
14:28:08.347 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [cluster_concurrent_rebalance] with [2]
14:28:08.372 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]
14:28:08.372 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [allow_rebalance] with [indices_all_active]
14:28:08.372 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using [cluster_concurrent_rebalance] with [2]
14:28:08.395 DEBUG o.e.cluster.routing.allocation - [Astrid Bloom]
using node_concurrent_recoveries [2],
node_initial_primaries_recoveries [4]

....

Repeats until OOM.