Shard Routeing state keep INITIALIZING


(Sicker) #1

Hi, I found that some shard routing not start. From the log found that
have 3 shard still INITIALIZING but From elasticsearch head have only one
shard still in INITIALIZING state.

https://lh4.googleusercontent.com/-XIokx15Ei2c/UE76fTxkbtI/AAAAAAAAADI/hdtMsDk6RTY/s1600/ShardRoutingStateProblem.jpg

This is the log file that I found:

[2012-09-08 00:43:38,994][WARN ][cluster.service ] [Straw Man]
failed to execute cluster state update, state:
version [368], source [routing-table-updater]
nodes:
[Straw Man][qM1wpq9eT_ec1PlMhmIuLg][inet[/10.90.116.175:9300]], local,
master
routing_table:
-- index [_river]
----shard_id [_river][0]
--------[_river][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[_river][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]

-- index [directory]
----shard_id [directory][0]
--------[directory][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
----shard_id [directory][1]
--------[directory][1], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[INITIALIZING]
--------[directory][1], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
----shard_id [directory][2]
--------[directory][2], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][2], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
----shard_id [directory][3]
--------[directory][3], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][3], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
----shard_id [directory][4]
--------[directory][4], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][4], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]

-- index [messagingevents]
----shard_id [messagingevents][0]
--------[messagingevents][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][0], node[null], [R], s[UNASSIGNED]
----shard_id [messagingevents][1]
--------[messagingevents][1], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][1], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][1], node[null], [R], s[UNASSIGNED]
----shard_id [messagingevents][2]
--------[messagingevents][2], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][2], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][2], node[null], [R], s[UNASSIGNED]
----shard_id [messagingevents][3]
--------[messagingevents][3], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][3], node[null], [R], s[UNASSIGNED]
--------[messagingevents][3], node[null], [R], s[UNASSIGNED]
----shard_id [messagingevents][4]
--------[messagingevents][4], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][4], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][4], node[null], [R], s[UNASSIGNED]

-- index [messages]
----shard_id [messages][0]
--------[messages][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][0], node[null], [R], s[UNASSIGNED]
----shard_id [messages][1]
--------[messages][1], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][1], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][1], node[null], [R], s[UNASSIGNED]
----shard_id [messages][2]
--------[messages][2], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][2], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][2], node[null], [R], s[UNASSIGNED]
----shard_id [messages][3]
--------[messages][3], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[INITIALIZING]
--------[messages][3], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][3], node[null], [R], s[UNASSIGNED]
----shard_id [messages][4]
--------[messages][4], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][4], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][4], node[null], [R], s[UNASSIGNED]

routing_nodes:
-----node_id[qM1wpq9eT_ec1PlMhmIuLg][V]
--------[_river][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][1], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[INITIALIZING]
--------[directory][2], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][3], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][4], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][1], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][2], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][4], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][1], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][2], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][3], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[INITIALIZING]
--------[messages][4], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
-----node_id[JaSEADR1S5uXpWswBhqCYQ][X]
--------[_river][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[directory][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[directory][1], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[directory][2], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[directory][3], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[directory][4], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][1], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][2], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][3], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][4], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][1], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][2], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][3], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][4], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
---- unassigned
--------[messagingevents][0], node[null], [R], s[UNASSIGNED]
--------[messagingevents][1], node[null], [R], s[UNASSIGNED]
--------[messagingevents][2], node[null], [R], s[UNASSIGNED]
--------[messagingevents][3], node[null], [R], s[UNASSIGNED]
--------[messagingevents][3], node[null], [R], s[UNASSIGNED]
--------[messagingevents][4], node[null], [R], s[UNASSIGNED]
--------[messages][0], node[null], [R], s[UNASSIGNED]
--------[messages][1], node[null], [R], s[UNASSIGNED]
--------[messages][2], node[null], [R], s[UNASSIGNED]
--------[messages][3], node[null], [R], s[UNASSIGNED]
--------[messages][4], node[null], [R], s[UNASSIGNED]

org.elasticsearch.cluster.routing.RoutingValidationException: [Index
[directory]: Shard [1] routing table has wrong number of replicas, expected
[1], got [0]]
at
org.elasticsearch.cluster.routing.RoutingTable.validateRaiseException(RoutingTable.java:87)
at
org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:117)
at
org.elasticsearch.cluster.routing.RoutingService$1.execute(RoutingService.java:135)
at
org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:211)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

--


(Sicker) #2

I found some log that specific the shard that inactive but I don't have any
idea what happen to this shard.

[2012-09-08 01:13:55,633][WARN ][index.engine.robin ] [Straw Man]
[messages][3] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.EngineClosedException: [messages][3]
CurrentState[CLOSED]
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:779)
at
org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:218)
at
org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:178)
at
org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:279)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

On Tuesday, September 11, 2012 3:47:06 PM UTC+7, Sicker wrote:

Hi, I found that some shard routing not start. From the log found that
have 3 shard still INITIALIZING but From elasticsearch head have only one
shard still in INITIALIZING state.

https://lh4.googleusercontent.com/-XIokx15Ei2c/UE76fTxkbtI/AAAAAAAAADI/hdtMsDk6RTY/s1600/ShardRoutingStateProblem.jpg

This is the log file that I found:

[2012-09-08 00:43:38,994][WARN ][cluster.service ] [Straw Man]
failed to execute cluster state update, state:
version [368], source [routing-table-updater]
nodes:
[Straw Man][qM1wpq9eT_ec1PlMhmIuLg][inet[/10.90.116.175:9300]], local,
master
routing_table:
-- index [_river]
----shard_id [_river][0]
--------[_river][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[_river][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]

-- index [directory]
----shard_id [directory][0]
--------[directory][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
----shard_id [directory][1]
--------[directory][1], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[INITIALIZING]
--------[directory][1], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
----shard_id [directory][2]
--------[directory][2], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][2], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
----shard_id [directory][3]
--------[directory][3], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][3], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
----shard_id [directory][4]
--------[directory][4], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][4], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]

-- index [messagingevents]
----shard_id [messagingevents][0]
--------[messagingevents][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][0], node[null], [R], s[UNASSIGNED]
----shard_id [messagingevents][1]
--------[messagingevents][1], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][1], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][1], node[null], [R], s[UNASSIGNED]
----shard_id [messagingevents][2]
--------[messagingevents][2], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][2], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][2], node[null], [R], s[UNASSIGNED]
----shard_id [messagingevents][3]
--------[messagingevents][3], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][3], node[null], [R], s[UNASSIGNED]
--------[messagingevents][3], node[null], [R], s[UNASSIGNED]
----shard_id [messagingevents][4]
--------[messagingevents][4], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][4], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][4], node[null], [R], s[UNASSIGNED]

-- index [messages]
----shard_id [messages][0]
--------[messages][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][0], node[null], [R], s[UNASSIGNED]
----shard_id [messages][1]
--------[messages][1], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][1], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][1], node[null], [R], s[UNASSIGNED]
----shard_id [messages][2]
--------[messages][2], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][2], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][2], node[null], [R], s[UNASSIGNED]
----shard_id [messages][3]
--------[messages][3], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[INITIALIZING]
--------[messages][3], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][3], node[null], [R], s[UNASSIGNED]
----shard_id [messages][4]
--------[messages][4], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][4], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][4], node[null], [R], s[UNASSIGNED]

routing_nodes:
-----node_id[qM1wpq9eT_ec1PlMhmIuLg][V]
--------[_river][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][1], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[INITIALIZING]
--------[directory][2], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][3], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[directory][4], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][1], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][2], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messagingevents][4], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][0], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][1], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][2], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
--------[messages][3], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[INITIALIZING]
--------[messages][4], node[qM1wpq9eT_ec1PlMhmIuLg], [R], s[STARTED]
-----node_id[JaSEADR1S5uXpWswBhqCYQ][X]
--------[_river][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[directory][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[directory][1], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[directory][2], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[directory][3], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[directory][4], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][1], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][2], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][3], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messagingevents][4], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][0], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][1], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][2], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][3], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
--------[messages][4], node[JaSEADR1S5uXpWswBhqCYQ], [P], s[STARTED]
---- unassigned
--------[messagingevents][0], node[null], [R], s[UNASSIGNED]
--------[messagingevents][1], node[null], [R], s[UNASSIGNED]
--------[messagingevents][2], node[null], [R], s[UNASSIGNED]
--------[messagingevents][3], node[null], [R], s[UNASSIGNED]
--------[messagingevents][3], node[null], [R], s[UNASSIGNED]
--------[messagingevents][4], node[null], [R], s[UNASSIGNED]
--------[messages][0], node[null], [R], s[UNASSIGNED]
--------[messages][1], node[null], [R], s[UNASSIGNED]
--------[messages][2], node[null], [R], s[UNASSIGNED]
--------[messages][3], node[null], [R], s[UNASSIGNED]
--------[messages][4], node[null], [R], s[UNASSIGNED]

org.elasticsearch.cluster.routing.RoutingValidationException: [Index
[directory]: Shard [1] routing table has wrong number of replicas, expected
[1], got [0]]
at
org.elasticsearch.cluster.routing.RoutingTable.validateRaiseException(RoutingTable.java:87)
at
org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:117)
at
org.elasticsearch.cluster.routing.RoutingService$1.execute(RoutingService.java:135)
at
org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:211)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

--


(system) #3