Unable to index new documents after cluster restart

Hari_Shankar · June 24, 2011, 4:34pm

Hi,

Everything was working fine before, I tried to index a large document, ran
into java heap:out of memory exception, so I decided to increase ES_MAX_MEM
and restart the cluster. Once done, I am now able to query/search my old
data which I had indexed before, but I am unable to index anything new. eg,
here is the error I get for PUT:

curl -XPUT 'http://localhost:9200/keywords/kw/11917_97095_822656_21256827'
-d
'{"KeywordID":"21256827","AdGroupID":"822656","NwAccountID":"11917","NwXRefCode":"757789966","NwXRefStatus":"Active","NetworkID":"1"}}'
{"error":"UnavailableShardsException[[keywords][4] [4] shardIt, [1] active :
Timeout waiting for [1m], request: index
{[keywords][kw][11917_97095_822656_21256827],
source[{"KeywordID":"21256827","AdGroupID":"822656","NwAccountID":"11917","NwXRefCode":"757789966","NwXRefStatus":"Active","NetworkID":"1"}}]}]","status":500}

I get the same error for bulk operations as well.

What does this mean?

Hari

Hari_Shankar · June 24, 2011, 4:46pm

I see this exception in the logs of some nodes:

[2011-06-24 11:28:38,359][WARN ][index.engine.robin ] [Ozymandias]
[keywords][0] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.EngineClosedException: [keywords][0]
CurrentState[CLOSED]
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:653)
at
org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:203)
at
org.elasticsearch.indices.memory.IndexingMemoryBufferController$ShardsIndicesStatusChecker.run(IndexingMemoryBufferController.java:147)
at
org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:201)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2011-06-24 11:28:38,371][WARN ][index.engine.robin ] [Ozymandias]
[keywords][3] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.EngineClosedException: [keywords][3]
CurrentState[CLOSED]
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:653)
at
org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:203)
at
org.elasticsearch.indices.memory.IndexingMemoryBufferController$ShardsIndicesStatusChecker.run(IndexingMemoryBufferController.java:147)
at
org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:201)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On another node, I saw this slightly diffferent exception:

[2011-06-24 11:28:44,936][WARN ][index.engine.robin ] [Puishannt]
[keywords][0] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.FlushNotAllowedEngineException: [keywords][0]
Recovery is in progress, flush is not allowed
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:657)
at
org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:203)
at
org.elasticsearch.indices.memory.IndexingMemoryBufferController$ShardsIndicesStatusChecker.run(IndexingMemoryBufferController.java:147)
at
org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:201)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

How long does recovery take if I restart the cluster? My total index size is
about 800 GB on 7 nodes. I have only one index (settings: 7 shards/3
replicas). Tried to force-open the index, but still same error)

Hari

On Fri, Jun 24, 2011 at 10:04 PM, Hari Shankar shaan.hari@gmail.com wrote:

Hi,

Everything was working fine before, I tried to index a large document, ran
into java heap:out of memory exception, so I decided to increase ES_MAX_MEM
and restart the cluster. Once done, I am now able to query/search my old
data which I had indexed before, but I am unable to index anything new. eg,
here is the error I get for PUT:

curl -XPUT 'http://localhost:9200/keywords/kw/11917_97095_822656_21256827'
-d
'{"KeywordID":"21256827","AdGroupID":"822656","NwAccountID":"11917","NwXRefCode":"757789966","NwXRefStatus":"Active","NetworkID":"1"}}'
{"error":"UnavailableShardsException[[keywords][4] [4] shardIt, [1] active
: Timeout waiting for [1m], request: index
{[keywords][kw][11917_97095_822656_21256827],
source[{"KeywordID":"21256827","AdGroupID":"822656","NwAccountID":"11917","NwXRefCode":"757789966","NwXRefStatus":"Active","NetworkID":"1"}}]}]","status":500}

I get the same error for bulk operations as well.

What does this mean?

Hari

Hari_Shankar · June 24, 2011, 6:30pm

Seems this got fixed now. It has been about 3 hours since I had restarted
the cluster now. Last time I tried (unsuccessfully) was about 30 mins after
restart. Is there a way to estimate approximately how long it would take to
recover an index after cluster restart/index close? Is it possible to
specify an order in which indices will be recovered after cluster restart?
Say if some of the smaller indices need to recover first. Or is it a
parallel process?

Hari

On Fri, Jun 24, 2011 at 10:16 PM, Hari Shankar shaan.hari@gmail.com wrote:

I see this exception in the logs of some nodes:

[2011-06-24 11:28:38,359][WARN ][index.engine.robin ] [Ozymandias]
[keywords][0] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.EngineClosedException: [keywords][0]
CurrentState[CLOSED]
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:653)
at
org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:203)
at
org.elasticsearch.indices.memory.IndexingMemoryBufferController$ShardsIndicesStatusChecker.run(IndexingMemoryBufferController.java:147)
at
org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:201)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2011-06-24 11:28:38,371][WARN ][index.engine.robin ] [Ozymandias]
[keywords][3] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.EngineClosedException: [keywords][3]
CurrentState[CLOSED]
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:653)
at
org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:203)
at
org.elasticsearch.indices.memory.IndexingMemoryBufferController$ShardsIndicesStatusChecker.run(IndexingMemoryBufferController.java:147)
at
org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:201)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On another node, I saw this slightly diffferent exception:

[2011-06-24 11:28:44,936][WARN ][index.engine.robin ] [Puishannt]
[keywords][0] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.FlushNotAllowedEngineException:
[keywords][0] Recovery is in progress, flush is not allowed
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:657)
at
org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:203)
at
org.elasticsearch.indices.memory.IndexingMemoryBufferController$ShardsIndicesStatusChecker.run(IndexingMemoryBufferController.java:147)
at
org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:201)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

How long does recovery take if I restart the cluster? My total index size
is about 800 GB on 7 nodes. I have only one index (settings: 7 shards/3
replicas). Tried to force-open the index, but still same error)

Hari

On Fri, Jun 24, 2011 at 10:04 PM, Hari Shankar shaan.hari@gmail.comwrote:

Hi,

Everything was working fine before, I tried to index a large document, ran
into java heap:out of memory exception, so I decided to increase ES_MAX_MEM
and restart the cluster. Once done, I am now able to query/search my old
data which I had indexed before, but I am unable to index anything new. eg,
here is the error I get for PUT:

curl -XPUT 'http://localhost:9200/keywords/kw/11917_97095_822656_21256827'
-d
'{"KeywordID":"21256827","AdGroupID":"822656","NwAccountID":"11917","NwXRefCode":"757789966","NwXRefStatus":"Active","NetworkID":"1"}}'
{"error":"UnavailableShardsException[[keywords][4] [4] shardIt, [1] active
: Timeout waiting for [1m], request: index
{[keywords][kw][11917_97095_822656_21256827],
source[{"KeywordID":"21256827","AdGroupID":"822656","NwAccountID":"11917","NwXRefCode":"757789966","NwXRefStatus":"Active","NetworkID":"1"}}]}]","status":500}

I get the same error for bulk operations as well.

What does this mean?

Hari

Topic		Replies	Views
Correct way to restart cluster / rejoin failed nodes Elasticsearch	5	1263	July 6, 2017
Lost data due to out of memory error Elasticsearch	6	1429	July 6, 2017
Index corruption when upload large number of documents (4billion+) Elasticsearch	5	1063	July 6, 2017
Search failures on server restarts Elasticsearch	14	484	July 6, 2017
Closed Index missing after restarting the elasticsearch cluster Elasticsearch	7	429	July 6, 2017

Unable to index new documents after cluster restart

Related topics