Cluster_block_exception blocks all ingestion

Hello,

We have a ELK 7.3.2 cluster with 3 nodes and almost 3 TB disk free. Also have Index lifecycle policy which force merge all indices older than 7 days

Free Disk:

ILP:

We deployed filebeat to ingest some old logs (September logs) and suddenly all the log(Current and old) ingestion stopped due to below error in logstash:

[2020-12-10T21:07:47,524][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"index [logstash-2020.09.09] blocked by: [FORBIDDEN/8/index write (api)];"})
[2020-12-10T21:07:47,524][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"index [logstash-2020.09.09] blocked by: [FORBIDDEN/8/index write (api)];"})
[2020-12-10T21:07:47,524][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>125}

When I checked the setting for this new created indices found "index.blocks.write": "true".
As a workaround i changed the blocks write to false after which all logs started coming into elastic search.

But couple of questions

  1. Why writes were blocked for this new index created ?
  2. And why the logs ingestion for all other indices blocked ? For example I was expecting the logs to get ingested into todays index.

Would really appreciate if someone could explain why this happened ? I'm more concerned about question #2 above as this is impacting the current logs ingested into elastic search. Is there a way to avoid this situation?

TIA.

Are you sure it was a new index? It'd make sense if this index already existed and had moved to the warm phase. Check your logs, the creation of a new index is logged by default.

A write block on one index doesn't affect traffic into the other indices within Elasticsearch, but it might within Filebeat. Were you using the same Filebeat instance for everything? Might be best to use a different one for your historical data.

@DavidTurner, Thanks a lot for your response.

Blockquote
Are you sure it was a new index? It'd make sense if this index already existed and had moved to the warm phase. Check your logs, the creation of a new index is logged by default.

Yes. These indices would have be been existed and deleted after backup. Looks like they were recreated due to older logs.

Blockquote
Were you using the same Filebeat instance for everything? Might be best to use a different one for your historical data.

No. There are multiple filebeats pushing logs into ELK. Only one of the filebeat started pushing older logs but this blocked everything getting into ELK. But we do have same instance of logstash. Is that causing the issue?

Thanks,

Perhaps -- you'd get a more helpful answer on the Logstash forums than here, I don't know Logstash very well.

Unless you have configured multiple independent pipelines for old and new data in Logstash batches being processed will contain both. If some of the documents fail and need to be retried, I would expect this to hold up the whole batch, which will eventually block all processing threads.

1 Like

@Christian_Dahlqvist, Thanks for you reply.

Indeed, we have multiple independent pipelines and one is the default pipeline through which all logs are pushed in the default index which is a daily index. But we don't have separate pipeline for old or new logs but they are configured on the grok parser. The older logs which we were trying to inject were getting pushed to default daily index as it didn't match any of the grok pattern in other pipelines.

Default pipeline:

input {
  beats {
    port => 5044
  }
}

output {
  elasticsearch {
    hosts => "http://<elasticsearch-ip>:9200"
    index => "logstash-%{+YYYY.MM.dd}"
  }
  stdout {}
}

What would be the solution here to avoid this in future? And I will start a new thread on logstash forum.

Thank you so much for your help.

@DavidTurner, Thanks. Will open up a new thread in Logstash forums.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.