Yep, I can connect to the other hosts from logger4 :
[14:41]root@logger4:/home/ndenev# telnet 10.100.0.20 9300
Trying 10.100.0.20...
Connected to 10.100.0.20.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
[14:41]root@logger4:/home/ndenev# telnet 10.100.0.10 9300
Trying 10.100.0.10...
Connected to 10.100.0.10.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
With logging level upped to DEBUG I got this on logger2 (10.100.0.20) :
[2012-10-16 15:09:21,101][DEBUG][transport.netty ] [logger2 ] connected to node [[logger4 ][jh7DLxuaQ72l_gFK_uKhUQ][inet[/10.100.0.40:9300]]]
And this on logger4 :
[2012-10-16 15:09:18,995][INFO ][node ] [logger4 ] {0.19.10}[7828]: initializing ...
[2012-10-16 15:09:19,001][INFO ][plugins ] [logger4 ] loaded , sites [bigdesk, head, paramedic]
[2012-10-16 15:09:19,009][DEBUG][common.compress.lzf ] using [UnsafeChunkDecoder] decoder
[2012-10-16 15:09:19,018][DEBUG][common.compress ] failed to load xerial snappy-java
java.lang.ExceptionInInitializerError
at org.elasticsearch.common.compress.snappy.xerial.XerialSnappy.(XerialSnappy.java:41)
at org.elasticsearch.common.compress.CompressorFactory.(CompressorFactory.java:58)
at org.elasticsearch.node.internal.InternalNode.(InternalNode.java:123)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:67)
at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:200)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: java.lang.NullPointerException
at java.lang.Throwable.printStackTrace(Throwable.java:477)
at java.lang.Throwable.printStackTrace(Throwable.java:468)
at org.xerial.snappy.Snappy.(Snappy.java:47)
... 7 more
[2012-10-16 15:09:19,030][DEBUG][env ] [logger4 ] using node location [[/tank/es-data/data/logger/nodes/0]], local_node_id [0]
[2012-10-16 15:09:19,625][DEBUG][threadpool ] [logger4 ] creating thread_pool [generic], type [cached], keep_alive [30s]
[2012-10-16 15:09:19,629][DEBUG][threadpool ] [logger4 ] creating thread_pool [index], type [cached], keep_alive [5m]
[2012-10-16 15:09:19,630][DEBUG][threadpool ] [logger4 ] creating thread_pool [bulk], type [cached], keep_alive [5m]
[2012-10-16 15:09:19,630][DEBUG][threadpool ] [logger4 ] creating thread_pool [get], type [cached], keep_alive [5m]
[2012-10-16 15:09:19,630][DEBUG][threadpool ] [logger4 ] creating thread_pool [search], type [cached], keep_alive [5m]
[2012-10-16 15:09:19,630][DEBUG][threadpool ] [logger4 ] creating thread_pool [percolate], type [cached], keep_alive [5m]
[2012-10-16 15:09:19,630][DEBUG][threadpool ] [logger4 ] creating thread_pool [management], type [scaling], min [1], size [5], keep_alive [5m]
[2012-10-16 15:09:19,632][DEBUG][threadpool ] [logger4 ] creating thread_pool [flush], type [scaling], min [1], size [10], keep_alive [5m]
[2012-10-16 15:09:19,632][DEBUG][threadpool ] [logger4 ] creating thread_pool [merge], type [scaling], min [1], size [20], keep_alive [5m]
[2012-10-16 15:09:19,633][DEBUG][threadpool ] [logger4 ] creating thread_pool [refresh], type [scaling], min [1], size [10], keep_alive [5m]
[2012-10-16 15:09:19,633][DEBUG][threadpool ] [logger4 ] creating thread_pool [cache], type [scaling], min [1], size [4], keep_alive [5m]
[2012-10-16 15:09:19,633][DEBUG][threadpool ] [logger4 ] creating thread_pool [snapshot], type [scaling], min [1], size [5], keep_alive [5m]
[2012-10-16 15:09:19,646][DEBUG][transport.netty ] [logger4 ] using worker_count[64], port[9300-9400], bind_host[null], publish_host[null], compress[false], connect_timeout[30s], connections_per_node[2/6/1], receive_predictor[512kb->512kb]
[2012-10-16 15:09:19,652][DEBUG][discovery.zen.ping.multicast] [logger4 ] using group [224.2.2.4], with port [54328], ttl [3], and address [null]
[2012-10-16 15:09:19,654][DEBUG][discovery.zen.ping.unicast] [logger4 ] using initial hosts , with concurrent_connects [10]
[2012-10-16 15:09:19,655][DEBUG][discovery.zen ] [logger4 ] using ping.timeout [3s], master_election.filter_client [true], master_election.filter_data [false]
[2012-10-16 15:09:19,658][DEBUG][discovery.zen.elect ] [logger4 ] using minimum_master_nodes [-1]
[2012-10-16 15:09:19,659][DEBUG][discovery.zen.fd ] [logger4 ] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2012-10-16 15:09:19,661][DEBUG][discovery.zen.fd ] [logger4 ] [node ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2012-10-16 15:09:19,674][DEBUG][monitor.jvm ] [logger4 ] enabled [true], last_gc_enabled [false], interval [1s], gc_threshold [{default=GcThreshold{name='default', warnThreshold=10000, infoThreshold=5000, debugThreshold=2000}, ParNew=GcThreshold{name='ParNew', warnThreshold=1000, infoThreshold=700, debugThreshold=400}, ConcurrentMarkSweep=GcThreshold{name='ConcurrentMarkSweep', warnThreshold=10000, infoThreshold=5000, debugThreshold=2000}}]
[2012-10-16 15:09:19,681][DEBUG][monitor.os ] [logger4 ] Using probe [org.elasticsearch.monitor.os.SigarOsProbe@111edceb] with refresh_interval [1s]
[2012-10-16 15:09:19,683][DEBUG][monitor.process ] [logger4 ] Using probe [org.elasticsearch.monitor.process.SigarProcessProbe@2e39d325] with refresh_interval [1s]
[2012-10-16 15:09:19,685][DEBUG][monitor.jvm ] [logger4 ] Using refresh_interval [1s]
[2012-10-16 15:09:19,685][DEBUG][monitor.network ] [logger4 ] Using probe [org.elasticsearch.monitor.network.SigarNetworkProbe@4e9e75f6] with refresh_interval [5s]
[2012-10-16 15:09:19,689][DEBUG][monitor.network ] [logger4 ] net_info
host [logger4]
bridge0 display_name [bridge0]
address [/10.100.0.40]
mtu [9000] multicast [true] ptp [false] loopback [false] up [true] virtual [false]
carp0 display_name [carp0]
address [/10.22.10.13]
mtu [1500] multicast [false] ptp [false] loopback [true] up [true] virtual [false]
lo0 display_name [lo0]
address [/127.0.0.1]
mtu [16384] multicast [true] ptp [false] loopback [true] up [true] virtual [false]
igb0 display_name [igb0]
address [/10.22.10.15]
mtu [1500] multicast [true] ptp [false] loopback [false] up [true] virtual [false]
[2012-10-16 15:09:19,692][DEBUG][monitor.fs ] [logger4 ] Using probe [org.elasticsearch.monitor.fs.SigarFsProbe@22343bb6] with refresh_interval [1s]
[2012-10-16 15:09:19,849][DEBUG][indices.store ] [logger4 ] using indices.store.throttle.type [none], with index.store.throttle.max_bytes_per_sec [0b]
[2012-10-16 15:09:19,853][DEBUG][cache.memory ] [logger4 ] using bytebuffer cache with small_buffer_size [1kb], large_buffer_size [1mb], small_cache_size [10mb], large_cache_size [500mb], direct [true]
[2012-10-16 15:09:19,875][DEBUG][cluster.routing.allocation.decider] [logger4 ] using node_concurrent_recoveries [2], node_initial_primaries_recoveries [4]
[2012-10-16 15:09:19,876][DEBUG][cluster.routing.allocation.decider] [logger4 ] using [cluster.routing.allocation.allow_rebalance] with [indices_all_active]
[2012-10-16 15:09:19,876][DEBUG][cluster.routing.allocation.decider] [logger4 ] using [cluster_concurrent_rebalance] with [2]
[2012-10-16 15:09:19,878][DEBUG][gateway.local ] [logger4 ] using initial_shards [quorum], list_timeout [30s]
[2012-10-16 15:09:19,928][DEBUG][indices.recovery ] [logger4 ] using max_size_per_sec[0b], concurrent_streams [3], file_chunk_size [512kb], translog_size [512kb], translog_ops [1000], and compress [true]
[2012-10-16 15:09:19,950][DEBUG][http.netty ] [logger4 ] using max_chunk_size[8kb], max_header_size[8kb], max_initial_line_length[4kb], max_content_length[100mb], receive_predictor[512kb->512kb]
[2012-10-16 15:09:19,954][DEBUG][indices.memory ] [logger4 ] using index_buffer_size [5.9gb], with min_shard_index_buffer_size [4mb], max_shard_index_buffer_size [512mb], shard_inactive_time [30m]
[2012-10-16 15:09:19,956][DEBUG][indices.cache.filter ] [logger4 ] using [node] weighted filter cache with size [20%], actual_size [11.9gb], expire [null], clean_interval [1m]
[2012-10-16 15:09:18,996][DEBUG][gateway.local.state.meta ] [logger4 ] using gateway.local.auto_import_dangled [YES], with gateway.local.dangling_timeout [2h]
[2012-10-16 15:09:18,996][DEBUG][gateway.local.state.meta ] [logger4 ] took 0s to load state
[2012-10-16 15:09:18,996][DEBUG][gateway.local.state.shards] [logger4 ] took 0s to load started shards state
[2012-10-16 15:09:18,998][DEBUG][bulk.udp ] [logger4 ] using enabled [false], host [null], port [9700-9800], bulk_actions [1000], bulk_size [5mb], flush_interval [5s], concurrent_requests [4]
[2012-10-16 15:09:18,999][INFO ][node ] [logger4 ] {0.19.10}[7828]: initialized
[2012-10-16 15:09:18,999][INFO ][node ] [logger4 ] {0.19.10}[7828]: starting ...
[2012-10-16 15:09:19,011][DEBUG][netty.channel.socket.nio.NioProviderMetadata] Couldn't determine the NIO constraint level from the system properties; using the safest level (2)
[2012-10-16 15:09:19,016][DEBUG][netty.channel.socket.nio.SelectorUtil] Using select timeout of 500
[2012-10-16 15:09:19,016][DEBUG][netty.channel.socket.nio.SelectorUtil] Epoll-bug workaround enabled = false
[2012-10-16 15:09:19,072][DEBUG][transport.netty ] [logger4 ] Bound to address [/10.100.0.40:9300]
[2012-10-16 15:09:19,073][INFO ][transport ] [logger4 ] bound_address {inet[/10.100.0.40:9300]}, publish_address {inet[/10.100.0.40:9300]}
The only thing that's diffrent is that logger2 is running with :
openjdk version "1.6.0_30"
OpenJDK Runtime Environment (build 1.6.0_30-b24)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
and logger4 :
openjdk version "1.6.0_32"
OpenJDK Runtime Environment (build 1.6.0_32-b26)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
The "xerial snappy-java" exception occurs on the other two hosts, without causing problems.
On Oct 16, 2012, at 3:26 PM, Radu Gheorghe radu.gheorghe@sematext.com wrote:
Hello Nikolay,
You can try turning on more verbose logging from /etc/elasticsearch/logging.yml
From your netstat output I can see that another machine connects to
logger4's 9300. But can you connect to 10.100.0.20 from logger4?
Best regards,
Radu
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene
On Tue, Oct 16, 2012 at 12:55 PM, Nikolay Denev ndenev@gmail.com wrote:
Hello,
I have ES cluster of 2 machines and I'm having problems adding a third one.
I've tried unicast and multicast discovery with the same results.
The first two machines discover each other, but when I start the third one,
I can see that it creates TCP connections to port 9300 on the first two
machines,
but nothing more happens. The last thing on the console is this :
[2012-10-16 11:39:07,151][INFO ][node ]
[logger4 ] {0.19.10}[6829]: initializing ...
[2012-10-16 11:39:07,157][INFO ][plugins ]
[logger4 ] loaded , sites [bigdesk, head, paramedic]
[2012-10-16 11:39:08,123][INFO ][node ]
[logger4 ] {0.19.10}[6829]: initialized
[2012-10-16 11:39:08,123][INFO ][node ]
[logger4 ] {0.19.10}[6829]: starting ...
[2012-10-16 11:39:07,196][INFO ][transport ]
[logger4 ] bound_address {inet[/10.100.0.40:9300]}, publish_address
{inet[/10.100.0.40:9300]}
Here is netstat output :
[11:54]root@logger4:/home/ndenev# netstat -anptcp
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 0 0 10.100.0.40.9300 10.100.0.20.23613
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.20.12586
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.20.34997
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.20.60125
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.20.36215
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.20.47311
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.20.12279
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.20.32206
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.20.32259
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.10.21844
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.10.19881
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.10.58187
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.10.28351
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.10.11270
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.10.60018
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.10.61148
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.10.24928
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 10.100.0.10.31942
ESTABLISHED
tcp4 0 0 10.100.0.40.9300 . LISTEN
Nothing shows in the log files of the first two machines.
Any ideas how to debug this and what may be causing it?
--
--
--