Logstash end up with cluster_block_exception and is frequently occuring

Hi,

We setup trial version of ELK stack (6.6.2) ( with 2 nodes of logstash, 3 nodes of Elastic cluster, 1 node of Kibana) and configure to flow messages from our Firewalls => Logstash => Elastic Search => Kibana.

logstash runs fine for a few hours but it crashes with the below-mentioned error and this is occurring frequently. This is not because of disk space problem. In all the nodes there is sufficient disk space available.

whenever we encounter this issue, running following command on each ES cluster node and restarting of all ES Nodes and logstash nodes will solve the issue.

curl -XPUT -H "Content-Type: application/json" http://<>:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

Came to know this is because of Indexes are getting locked...
So i wanted to know more on following.

  1. Why indexes are getting locked How can we avoid this?
  2. Do you suggest a better approach/configuration of certain elastic parameters to avoid indexes getting locked?
  3. When all nodes started working, logstash tries to send all messages that are in memory and end up with info items such as mentioned in the below section "Buffered Entries". Will te logstash send every message that is received and buffered or Are there any chances of losing data?

We are not using any persistent buffers on the logstatsh front.

Error Deatils:

"[2019-03-28T00:22:54,815][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})

[2019-03-28T00:22:54,815][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>125}

[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,316][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-28T23:12:09,317][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>71}

Missing of Data
Here is the snapshot of last 7 days of data. If you observe Inbetween there is missing of data. If we restart it worked some times.

Disk Space availability
image003

  1. The data coming from some firewalls in a day can vary from few MBs to GBs ex 1.6 or 1.7 Gb too. Initially logged but turned off writing to a file .

Buffered Entries
[2019-03-28T18:28:36,854][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of processing of [101889343][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[logstash-pfw01-vsys1-2019.03.28][2]] containing [9] requests, target allocation id: GX4dk7HHTm2pTBg3xbefbw, primary term: 1 on EsThreadPoolExecutor[name = es-node1/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@ea77817[Running, pool size = 4, active threads = 4, queued tasks = 200, completed tasks = 89000938]]"})
[2019-03-28T18:28:36,854][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of processing of [101889361][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[logstash-pfw01-vsys1-2019.03.28][3]] containing [9] requests, target allocation id: qSjfpiMkSH6MikM6LjZwcg, primary term: 1 on EsThreadPoolExecutor[name = es-node1/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@ea77817[Running, pool size = 4, active threads = 4, queued tasks = 200, completed tasks = 89000944]]"})
[2019-03-28T18:28:36,855][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of processing of [101889332][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[logstash-vfw07-vsys1-2019.03.28][0]] containing [7] requests, target allocation id: xBhWfCN8SlugOEeMDVo55A, primary term: 1 on EsThreadPoolExecutor[name = es-node1/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@ea77817[Running, pool size = 4, active threads = 4, queued tasks = 200, completed tasks = 89000938]]"})

Any help in this regard is highly appreciated.

Thanks,
Karunya

What error do you get in the elasticsearch logs when logstash starts getting events rejected?

Badger,

I did not find any errors on the Elasticsearch side

Regards,
Karunya

If elasticsearch does not log the reason why it flips the index to read only (which is usually disk space) I would be amazed. Because if it does not log the reason there is no way for you know why it did it.

Hello Badger,

  Again the Logstash has throwing following erros and the last few lines from Elastic Search are also included. There are few exceptions at elasticsearch log but occured around 5 hours back)...

See the below
i)logstash entries are showing current time in sydney (2019-03-29T02:26) and ii) Last few lines of elasticlog entries written around sydney time around 2019-03-28T21:28:23)

Logstash error (Occuring now at 2019-03-29T02:26)

block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2019-03-29T02:26:19,794][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>71}


On the Elastic Masternode the logfile has following entries:

[root@HIT-DCE-LOGCENTER08 ~]# tail -f -n 500 /var/log/elasticsearch/bgus-log-central.log
[2019-03-28T21:28:23,971][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-apnicfw03-frankston-bb-2019.03.27][3] primary-replica resync completed with 0 operations
[2019-03-28T21:28:23,976][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-infin8it-vsys1-2019.03.22][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:23,991][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-vfw07-vsys3-2019.03.23][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,004][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-customer-unknown-2019.03.23][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,006][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-vfw07-vsys3-2019.03.28][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,011][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-vfw07-vsys3-2019.03.23][3] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,013][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-vfw07-vsys3-2019.03.28][3] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,028][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-customer-unknown-2019.03.22][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,033][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-customer-unknown-2019.03.22][2] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,044][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-vfw07-vsys2-2019.03.28][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,047][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-pfw01-vsys1-2019.03.27][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,088][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-customer-unknown-2019.03.28][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,343][INFO ][o.e.i.s.IndexShard ] [ms-node-1] [logstash-customer-unknown-2019.03.26][4] primary-replica resync completed with 0 operations
[2019-03-28T21:28:24,355][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [ms-node-1] failed to execute on node [QHAldAM0S1SPnfjZsEN1zA]
org.elasticsearch.transport.NodeNotConnectedException: [data-node-1][10.249.3.97:9300] Node not connected
at org.elasticsearch.transport.ConnectionManager.getConnection(ConnectionManager.java:156) ~[elasticsearch-6.6.2.jar:6.6.2]
at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:557) ~[elasticsearch-6.6.2.jar:6.6.2]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:529) ~[elasticsearch-6.6.2.jar:6.6.2]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.start(Tra---------------------------------------------

i could not send entire log file as there is 7000 lines limitation here.
see if you can access the complete log file at https://drive.google.com/drive/folders/1Cg7mfbVePijosz0nsJSj8seBmiMLCZat?usp=sharing

Thanks in advance
Karunya

This is correct, but I can be more definitive. There are only two ways that this block can arise: either manually via the API, or automatically if disk space is too low on one or more nodes. If put in place automatically, Elasticsearch does log warning messages. I looked at the logs shared above and I think they do not go far enough back in time. You can find the relevant log messages in past log files by looking for lines containing the string DiskThresholdMonitor.

Note that the block is not released when disk space is freed. You must release it manually once you are sure you have stabilised the situation. Thus the disk space might be ok now, but the block still remains because of a past shortage. From the reference manual:

The index block must be released manually once there is enough disk space available to allow indexing operations to continue.

It also contains an example of how to do this.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.