Logstash end up with cluster_block_exception and is frequently occuring

Kaunya · March 28, 2019, 1:23pm

Hi,

We setup trial version of ELK stack (6.6.2) ( with 2 nodes of logstash, 3 nodes of Elastic cluster, 1 node of Kibana) and configure to flow messages from our Firewalls => Logstash => Elastic Search => Kibana.

logstash runs fine for a few hours but it crashes with the below-mentioned error and this is occurring frequently. This is not because of disk space problem. In all the nodes there is sufficient disk space available.

whenever we encounter this issue, running following command on each ES cluster node and restarting of all ES Nodes and logstash nodes will solve the issue.

curl -XPUT -H "Content-Type: application/json" http://<>:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

Came to know this is because of Indexes are getting locked...
So i wanted to know more on following.

Why indexes are getting locked How can we avoid this?
Do you suggest a better approach/configuration of certain elastic parameters to avoid indexes getting locked?
When all nodes started working, logstash tries to send all messages that are in memory and end up with info items such as mentioned in the below section "Buffered Entries". Will te logstash send every message that is received and buffered or Are there any chances of losing data?

We are not using any persistent buffers on the logstatsh front.

Error Deatils:

"[2019-03-28T00:22:54,815][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})

[2019-03-28T00:22:54,815][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>125}

[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,317][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>71}

Missing of Data
Here is the snapshot of last 7 days of data. If you observe Inbetween there is missing of data. If we restart it worked some times.

Disk Space availability

The data coming from some firewalls in a day can vary from few MBs to GBs ex 1.6 or 1.7 Gb too. Initially logged but turned off writing to a file .

diskspace.png993×325 88.8 KB

Buffered Entries
[2019-03-28T18:28:36,854][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of processing of [101889343][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[logstash-pfw01-vsys1-2019.03.28][2]] containing [9] requests, target allocation id: GX4dk7HHTm2pTBg3xbefbw, primary term: 1 on EsThreadPoolExecutor[name = es-node1/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@ea77817[Running, pool size = 4, active threads = 4, queued tasks = 200, completed tasks = 89000938]]"})
[2019-03-28T18:28:36,854][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of processing of [101889361][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[logstash-pfw01-vsys1-2019.03.28][3]] containing [9] requests, target allocation id: qSjfpiMkSH6MikM6LjZwcg, primary term: 1 on EsThreadPoolExecutor[name = es-node1/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@ea77817[Running, pool size = 4, active threads = 4, queued tasks = 200, completed tasks = 89000944]]"})
[2019-03-28T18:28:36,855][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of processing of [101889332][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[logstash-vfw07-vsys1-2019.03.28][0]] containing [7] requests, target allocation id: xBhWfCN8SlugOEeMDVo55A, primary term: 1 on EsThreadPoolExecutor[name = es-node1/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@ea77817[Running, pool size = 4, active threads = 4, queued tasks = 200, completed tasks = 89000938]]"})

Any help in this regard is highly appreciated.

Thanks,
Karunya

Badger · March 28, 2019, 2:15pm

What error do you get in the elasticsearch logs when logstash starts getting events rejected?

Kaunya · March 28, 2019, 2:35pm

Badger,

I did not find any errors on the Elasticsearch side

Regards,
Karunya

Badger · March 28, 2019, 3:21pm

If elasticsearch does not log the reason why it flips the index to read only (which is usually disk space) I would be amazed. Because if it does not log the reason there is no way for you know why it did it.

Kaunya · March 28, 2019, 3:53pm

Hello Badger,

  Again the Logstash has throwing following erros and the last few lines from Elastic Search are also included. There are few exceptions at elasticsearch log but occured around 5 hours back)...

See the below
i)logstash entries are showing current time in sydney (2019-03-29T02:26) and ii) Last few lines of elasticlog entries written around sydney time around 2019-03-28T21:28:23)

Logstash error (Occuring now at 2019-03-29T02:26)

block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>71}

On the Elastic Masternode the logfile has following entries:

[root@HIT-DCE-LOGCENTER08 ~]# tail -f -n 500 /var/log/elasticsearch/bgus-log-central.log
[2019-03-28T21:28:23,971][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-apnicfw03-frankston-bb-2019.03.27][3] primary-replica resync completed with 0 operations
[2019-03-28T21:28:23,976][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-infin8it-vsys1-2019.03.22][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:23,991][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-vfw07-vsys3-2019.03.23][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,004][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-customer-unknown-2019.03.23][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,006][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-vfw07-vsys3-2019.03.28][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,011][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-vfw07-vsys3-2019.03.23][3] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,013][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-vfw07-vsys3-2019.03.28][3] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,028][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-customer-unknown-2019.03.22][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,033][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-customer-unknown-2019.03.22][2] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,044][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-vfw07-vsys2-2019.03.28][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,047][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-pfw01-vsys1-2019.03.27][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,088][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-customer-unknown-2019.03.28][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,343][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-customer-unknown-2019.03.26][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,355][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [ms-node-1] failed to execute on node [QHAldAM0S1SPnfjZsEN1zA]
org.elasticsearch.transport.NodeNotConnectedException: [data-node-1][10.249.3.97:9300] Node not connected
at org.elasticsearch.transport.ConnectionManager.getConnection(ConnectionManager.java:156) ~[elasticsearch-6.6.2.jar:6.6.2]
at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:557) ~[elasticsearch-6.6.2.jar:6.6.2]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:529) ~[elasticsearch-6.6.2.jar:6.6.2]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.start(Tra---------------------------------------------

i could not send entire log file as there is 7000 lines limitation here.
see if you can access the complete log file at https://drive.google.com/drive/folders/1Cg7mfbVePijosz0nsJSj8seBmiMLCZat?usp=sharing

Thanks in advance
Karunya

DavidTurner · March 29, 2019, 5:41am

This is correct, but I can be more definitive. There are only two ways that this block can arise: either manually via the API, or automatically if disk space is too low on one or more nodes. If put in place automatically, Elasticsearch does log warning messages. I looked at the logs shared above and I think they do not go far enough back in time. You can find the relevant log messages in past log files by looking for lines containing the string DiskThresholdMonitor.

Note that the block is not released when disk space is freed. You must release it manually once you are sure you have stabilised the situation. Thus the disk space might be ok now, but the block still remains because of a past shortage. From the reference manual:

The index block must be released manually once there is enough disk space available to allow indexing operations to continue.

It also contains an example of how to do this.

system · April 26, 2019, 5:41am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
FORBIDDEN/12/index read-only / allow delete (api) Elasticsearch	12	58423	April 27, 2018
[FORBIDDEN/12/index read-only Elasticsearch ilm-index-lifecycle-management	8	2238	April 9, 2019
Logstash cannot index data to Elasticsearch [FORBIDDEN/8/index write (api)] Elasticsearch	9	14092	May 3, 2019
Index logstash-* blocked by TOO_MANY_REQUESTS/12/index read-only Logstash	5	3155	August 25, 2020
Retrying individual bulk actions that failed or were rejected by the previous bulk request Elasticsearch	14	22150	August 1, 2018

Logstash end up with cluster_block_exception and is frequently occuring

Error Deatils:

See the below i)logstash entries are showing current time in sydney (2019-03-29T02:26) and ii) Last few lines of elasticlog entries written around sydney time around 2019-03-28T21:28:23)

Logstash error (Occuring now at 2019-03-29T02:26)

On the Elastic Masternode the logfile has following entries:

Related topics

See the below
i)logstash entries are showing current time in sydney (2019-03-29T02:26) and ii) Last few lines of elasticlog entries written around sydney time around 2019-03-28T21:28:23)