Elasticsearch crashes with - org.elasticsearch.cluster.block.ClusterBlockException

Hi,

When I am indexing few repositories files using fscrawler, elasticsearch is crashing.
The exception i got from my program is :

ConnectionError(<urllib3.connection.HTTPConnection object at 0x7feaf2a65e10>: Failed to establish a new connection:
 [Errno 111] Connection refused) caused by:
 NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7feaf2a65e10>:
 Failed to establish a new connection: [Errno 111] Connection refused)

The status of elastic search is:

 elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
   Active: failed (Result: signal) since Wed 2020-05-27 21:48:11 UTC; 59s ago
     Docs: http://www.elastic.co
  Process: 3499 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet (code=killed, signal=KILL)
 Main PID: 3499 (code=killed, signal=KILL)
   CGroup: /system.slice/elasticsearch.service

May 27 21:25:26 li393-89.members.linode.com systemd[1]: Starting Elasticsearch...
May 27 21:25:27 li393-89.members.linode.com elasticsearch[3499]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be re...e release.
May 27 21:25:51 li393-89.members.linode.com systemd[1]: Started Elasticsearch.
May 27 21:48:11 li393-89.members.linode.com systemd[1]: elasticsearch.service: main process exited, code=killed, status=9/KILL
May 27 21:48:11 li393-89.members.linode.com systemd[1]: Unit elasticsearch.service entered failed state.
May 27 21:48:11 li393-89.members.linode.com systemd[1]: elasticsearch.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

In /var/log/elasticsearch/elasticsearch.log , i see below WARNING logs

[2020-05-27T22:30:31,247][WARN ][r.suppressed             ] [li393-89.members.linode.com] path: /.kibana_task_manager/_update_by_query, params: {ignore_unavailable=true, refresh=true, conflicts=proceed, index=.kibana_task_manager, max_docs=10}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:534) [elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:305) [elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:563) [elasticsearch-7.5.1.jar:7.5.1]
 
[2020-05-27T22:30:32,085][WARN ][r.suppressed             ] [li393-89.members.linode.com] path: /.kibana/_doc/space%3Adefault, params: {index=.kibana, id=space:default}
org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [.kibana][_doc][space:default]: routing [null]]
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.perform(TransportSingleShardAction.java:224) [elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.start(TransportSingleShardAction.java:201) [elasticsearch-7.5.1.jar:7.5.1]
 

Below is the WARNING log I saw when elasticsearch crashed for the first time.

[2020-05-27T14:22:38,029][WARN ][r.suppressed             ] [li393-89.members.linode.com] path: /.kibana_task_manager/_update_by_query, params: {ignore_unavailable=true, refresh=true, conflicts=proceed, index=.kibana_task_manager, max_docs=10}
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
        at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:189) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:175) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:467) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.search.TransportSearchAction.executeLocalSearch(TransportSearchAction.java:400) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.search.TransportSearchAction.lambda$doExecute$3(TransportSearchAction.java:212) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.5.1.jar:7.5.1]

Below is my elasticsearch.yml file

# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
discovery.seed_hosts: ["50.116.48.89:9200"]

If I delete all my indexes and restart the elasticsearch , then crash is not happening. If I run without these two steps, elasticsearch is failing in middle of my indexing.

Could you please tell me why the crash is happening ??

-Lisa

Can you post your entire Elasticsearch log? Use gist/pastebin/etc if you need to :slight_smile:

please let me know if you are not able to access above link

-Lisa

It looks like Elasticsearch was restarted at around 2020-05-27T14:22:09,928.

What's the output of _cat/nodes?v and _cat/allocation?v?

I just had another crash, i restarted the elasticsearch.service
This is what i got for
GET _cat/nodes?v

ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1            9          89  45    2.48    0.88     0.73 dilm      *      li393-89.members.linode.com

and for
GET _cat/allocation?v

shards disk.indices disk.used disk.avail disk.total disk.percent host      ip        node
    51      457.7mb    69.3gb     87.6gb      157gb           44 127.0.0.1 127.0.0.1 li393-89.members.linode.com
    48                                                                               UNASSIGNED

Can you please post the logs for that recent activity?

Thanks for sticking with this. Does that include the logs prior to the crash? They might explain what happened.

These are recent logs. elasticseach.log for May 28th started i think. May 27th logs are in .gz file. let me create another gist for those and share it with you

Is the 2020-05-28T00:51:13,492 timestamp when you restarted things?

Ok here are my console logs when checking the status after crash and restarted.

checking the status here:

● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendo                                                                                                             r preset: disabled)
   Active: failed (Result: signal) since Thu 2020-05-28 00:25:08 UTC; 25min ago
     Docs: http://www.elastic.co
  Process: 11629 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_D                                                                                                             IR}/elasticsearch.pid --quiet (code=killed, signal=KILL)
 Main PID: 11629 (code=killed, signal=KILL)
   CGroup: /system.slice/elasticsearch.service

May 27 23:13:12 li393-89.members.linode.com systemd[1]: Starting Elasticsearc...
May 27 23:13:13 li393-89.members.linode.com elasticsearch[11629]: OpenJDK 64-...
May 27 23:13:41 li393-89.members.linode.com systemd[1]: Started Elasticsearch.
May 28 00:25:08 li393-89.members.linode.com systemd[1]: elasticsearch.service...
May 28 00:25:08 li393-89.members.linode.com systemd[1]: Unit elasticsearch.se...
May 28 00:25:08 li393-89.members.linode.com systemd[1]: elasticsearch.service...
Hint: Some lines were ellipsized, use -l to show in full.

restarted here:

● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendo                                                                                                             r preset: disabled)
   Active: active (running) since Thu 2020-05-28 00:51:32 UTC; 18s ago
     Docs: http://www.elastic.co
 Main PID: 18209 (java)
   CGroup: /system.slice/elasticsearch.service
           ├─18209 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress....
           └─18312 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-...

May 28 00:51:04 li393-89.members.linode.com systemd[1]: Starting Elasticsearc...
May 28 00:51:05 li393-89.members.linode.com elasticsearch[18209]: OpenJDK 64-...
May 28 00:51:32 li393-89.members.linode.com systemd[1]: Started Elasticsearch.
Hint: Some lines were ellipsized, use -l to show in full.


The first link contains all logs even before crash except the recent crash.
if you want the day before one, i will share it with you

It's weird, I can't seem to see why it's crashing. It simply looks like it's being restarted, as per these lines;

oh no..
How can I reset?
It was working yesterday, did not introduce new changes also.
What about the clusterblock exception? I thought that was the culprit

There's nothing in the logs to suggest a crash that I can see. Just that the process is restarted for some reason. The cluster block is a result of the restart, as it's trying to recover the shards and certain ones are not yet available for Kibana.

The question I have is why are you restarting it, what makes you think it's crashed?

While I am creating an index with custom analyzer, I saw the below exception in my application logs.

ConnectionError(<urllib3.connection.HTTPConnection object at 0x7feaf2a65e10>: Failed to establish a new connection:
 [Errno 111] Connection refused) caused by:
 NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7feaf2a65e10>:
 Failed to establish a new connection: [Errno 111] Connection refused)

It says failed to establish a new connection, the first thing that came to my mind is, elasticsearch is running or not.
When I checked the status, it displays failed state.
So thats why I restarted the elasticsearch service

Is that using the service command? You should really be checking Elasticsearch logs and the APIs.
Do you have Monitoring enabled?

Yes, sudo systemctl status elasticsearch.service
I didnt enable any monitoring. :frowning_face:

I suspect that 4gb of HEAP might be a bit too small when it comes with FSCrawler bulk requests.
You can try to reduce the size of the bulk by modifying the bulk settings.

Hi,

I have been trying with bulksize = 5, do you want me to lower than that?

One more wild guess here, While I am creating index with custom analyzer.Below is what I am doing.

es = connections.create_connection(hosts=['localhost'])

I have seen some posts here caused by connection creation without port=9200. Is this the possible culprit?
I am running my scripts to create indexes with

from elasticsearch_dsl import Index,Search,Document, analyzer, tokenizer,\
                           connections,Mapping, Nested, Text, Keyword, InnerDoc

es = connections.create_connection(hosts=['localhost'], port = 9200)

Just let me know is this correct way to give port number or not?

-Lisa