Hi,
Yes, I missed it. But I don't know how, the cluster API was reporting
"green" and showing the total number of nodes as "2"
Now, I changed to zen discovery and deleted all the existing indexes. The
node discovery happens properly, but after adding index (this time followed
the twitter example) and putting some data, a subsequent reboot is giving
the same issue. Please find the steps I have followed.
- Installed Java
java -version
java version "1.6.0_29"
Java(TM) SE Runtime Environment (build 1.6.0_29-b11)
Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02, mixed mode)
- This is an ubuntu 64 bit machine (11.04)
uname -a
Linux ubu-ser 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011
x86_64 x86_64 x86_64 GNU/Linux
-
Installed Elastic Search 0.17.9 and Service Wrapper.
-
Configured a two node cluster
Node01 Configuration*
cat /root/elasticsearch-0.17.8/config/elasticsearch.yml
cluster:
name: gnulinux
node.name: "Ubu2"
node.master: true
node.data: true
node.rack: rack01
network:
bindHost: 192.168.2.10
publishHost: 192.168.2.10
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.type: local
index.number_of_shards: 5
index.number_of_replicas: 1
gateway.recover_after_nodes: 2
gateway.recover_after_time: 5m
gateway.expected_nodes: 2
cluster.routing.allocation.node_initial_primaries_recoveries: 4
cluster.routing.allocation.node_concurrent_recoveries: 2
indices.recovery.concurrent_streams: 5
index:
store:
fs:
memory:
enabled: true
discovery:
zen:
ping_timeout: 30s
ping:
multicast:
enabled: false
unicast:
enabled: true
hosts: 192.168.2.10, 192.168.2.11
fd:
ping_retries: 10
ping_interval: 5s
ping_timeout: 30s
Node02 Configuration
#cat /root/elasticsearch-0.17.8/config/elasticsearch.yml
cluster:
name: gnulinux
node.name: "Ubu1"
node.master: false
node.data: true
node.rack: rack01
network:
bindHost: 192.168.2.11
publishHost: 192.168.2.11
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.type: local
index.number_of_shards: 5
index.number_of_replicas: 1
gateway.recover_after_nodes: 2
gateway.recover_after_time: 5m
gateway.expected_nodes: 2
cluster.routing.allocation.node_initial_primaries_recoveries: 4
cluster.routing.allocation.node_concurrent_recoveries: 2
indices.recovery.concurrent_streams: 5
index:
store:
fs:
memory:
enabled: true
discovery:
zen:
ping_timeout: 30s
ping:
multicast:
enabled: false
unicast:
enabled: true
hosts: 192.168.2.10, 192.168.2.11
fd:
ping_retries: 10
ping_interval: 5s
ping_timeout: 30s
- Started both the nodes and checked the status, both were up. Verified the
log file as well. Everything was fine till this step.
{
"cluster_name" : "gnulinux",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
{
"cluster_name" : "gnulinux",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
- Added some data and restarted the nodes, the cluster status is red and
log file is giving the same error.
{
"cluster_name" : "gnulinux",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 5,
"unassigned_shards" : 5
}
{
"cluster_name" : "gnulinux",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 5,
"unassigned_shards" : 5
[2011-10-22 01:18:46,703][WARN ][cluster.action.shard ] [Ubu1] sending
failed shard for [twitter][4], node[fgELpN11R6m2XbIKdLHYgg], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[twitter][4] shard allocated for local
recovery (post api), should exists, but doesn't]]]
[2011-10-22 01:18:46,704][WARN ][cluster.action.shard ] [Ubu1] sending
failed shard for [twitter][1], node[fgELpN11R6m2XbIKdLHYgg], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[twitter][1] shard allocated for local
recovery (post api), should exists, but doesn't]]]
[2011-10-22 01:18:46,704][WARN ][cluster.action.shard ] [Ubu1] sending
failed shard for [twitter][0], node[fgELpN11R6m2XbIKdLHYgg], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[twitter][0] shard allocated for local
recovery (post api), should exists, but doesn't]]]
[2011-10-22 01:18:46,725][WARN ][indices.cluster ] [Ubu1]
[twitter][1] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[twitter][1] shard allocated for local recovery (post api), should exists,
but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:99)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:179)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2011-10-22 01:18:46,736][WARN ][indices.cluster ] [Ubu1]
[twitter][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[twitter][4] shard allocated for local recovery (post api), should exists,
but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:99)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:179)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2011-10-22 01:18:46,737][WARN ][cluster.action.shard ] [Ubu1] sending
failed shard for [twitter][4], node[fgELpN11R6m2XbIKdLHYgg], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[twitter][4] shard allocated for local
recovery (post api), should exists, but doesn't]]]
[2011-10-22 01:18:46,737][WARN ][cluster.action.shard ] [Ubu1] sending
failed shard for [twitter][1], node[fgELpN11R6m2XbIKdLHYgg], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[twitter][1] shard allocated for local
recovery (post api), should exists, but doesn't]]]
Thanks
Viji
On Fri, Oct 21, 2011 at 11:57 PM, Shay Banon kimchy@gmail.com wrote:
Are you sure the two nodes find each other? The configuration you have
configure jgroups for discovery, which was removed in version 0.6 ....
On Fri, Oct 21, 2011 at 11:53 AM, gnulinux vijivijayakumar@gmail.comwrote:
Hi
I am evaluating Elasticsearch (0.17.8) for a spatial search platform.
I was able to setup a two-node cluster and everything was working
fine. But after rebooting both the nodes, I am getting the following
error on both.
[2011-10-19 06:02:45,243][WARN ][indices.cluster ] [linux
Ubu2] [books][1] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[books][1] shard allocated for local recovery (post api), should
exists, but doesn't
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:
99)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Config Files:
Node01 (Master)
cluster:
name: gnulinux
node.name: "linux Ubu2"
node.master: true
node.data: true
node.rack: rack01
network:
bindHost: 192.168.2.10
publishHost: 192.168.2.10
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.type: local
index.number_of_shards: 5
index.number_of_replicas: 1
gateway.recover_after_nodes: 2
gateway.recover_after_time: 5m
gateway.expected_nodes: 2
cluster.routing.allocation.node_initial_primaries_recoveries: 4
cluster.routing.allocation.node_concurrent_recoveries: 2
indices.recovery.concurrent_streams: 5
index:
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_address: 192.168.2.10
tcpping:
initial_hosts: 192.168.2.10[9700], 192.168.2.11[9700]
Node02:
cluster:
name: gnulinux
node.name: "linux Ubu1"
node.master: false
node.data: true
node.rack: rack01
network:
bindHost: 192.168.2.11
publishHost: 192.168.2.11
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.type: local
index.number_of_shards: 5
index.number_of_replicas: 1
gateway.recover_after_nodes: 2
gateway.recover_after_time: 5m
gateway.expected_nodes: 2
cluster.routing.allocation.node_initial_primaries_recoveries: 4
cluster.routing.allocation.node_concurrent_recoveries: 2
indices.recovery.concurrent_streams: 5
index:
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_address: 192.168.2.11
tcpping:
initial_hosts: 192.168.2.10[9700], 192.168.2.11[9700]