Elasticsearch crashes with - org.elasticsearch.cluster.block.ClusterBlockException

Lisahtwy · May 27, 2020, 11:01pm

Hi,

When I am indexing few repositories files using fscrawler, elasticsearch is crashing.
The exception i got from my program is :

ConnectionError(<urllib3.connection.HTTPConnection object at 0x7feaf2a65e10>: Failed to establish a new connection:
 [Errno 111] Connection refused) caused by:
 NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7feaf2a65e10>:
 Failed to establish a new connection: [Errno 111] Connection refused)

The status of elastic search is:

 elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
   Active: failed (Result: signal) since Wed 2020-05-27 21:48:11 UTC; 59s ago
     Docs: http://www.elastic.co
  Process: 3499 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet (code=killed, signal=KILL)
 Main PID: 3499 (code=killed, signal=KILL)
   CGroup: /system.slice/elasticsearch.service

May 27 21:25:26 li393-89.members.linode.com systemd[1]: Starting Elasticsearch...
May 27 21:25:27 li393-89.members.linode.com elasticsearch[3499]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be re...e release.
May 27 21:25:51 li393-89.members.linode.com systemd[1]: Started Elasticsearch.
May 27 21:48:11 li393-89.members.linode.com systemd[1]: elasticsearch.service: main process exited, code=killed, status=9/KILL
May 27 21:48:11 li393-89.members.linode.com systemd[1]: Unit elasticsearch.service entered failed state.
May 27 21:48:11 li393-89.members.linode.com systemd[1]: elasticsearch.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

In /var/log/elasticsearch/elasticsearch.log , i see below WARNING logs

[2020-05-27T22:30:31,247][WARN ][r.suppressed             ] [li393-89.members.linode.com] path: /.kibana_task_manager/_update_by_query, params: {ignore_unavailable=true, refresh=true, conflicts=proceed, index=.kibana_task_manager, max_docs=10}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:534) [elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:305) [elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:563) [elasticsearch-7.5.1.jar:7.5.1]

[2020-05-27T22:30:32,085][WARN ][r.suppressed             ] [li393-89.members.linode.com] path: /.kibana/_doc/space%3Adefault, params: {index=.kibana, id=space:default}
org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [.kibana][_doc][space:default]: routing [null]]
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.perform(TransportSingleShardAction.java:224) [elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.start(TransportSingleShardAction.java:201) [elasticsearch-7.5.1.jar:7.5.1]

Below is the WARNING log I saw when elasticsearch crashed for the first time.

[2020-05-27T14:22:38,029][WARN ][r.suppressed             ] [li393-89.members.linode.com] path: /.kibana_task_manager/_update_by_query, params: {ignore_unavailable=true, refresh=true, conflicts=proceed, index=.kibana_task_manager, max_docs=10}
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
        at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:189) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:175) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:467) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.search.TransportSearchAction.executeLocalSearch(TransportSearchAction.java:400) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.search.TransportSearchAction.lambda$doExecute$3(TransportSearchAction.java:212) ~[elasticsearch-7.5.1.jar:7.5.1]
        at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.5.1.jar:7.5.1]

Below is my elasticsearch.yml file

# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
discovery.seed_hosts: ["50.116.48.89:9200"]

If I delete all my indexes and restart the elasticsearch , then crash is not happening. If I run without these two steps, elasticsearch is failing in middle of my indexing.

Could you please tell me why the crash is happening ??

-Lisa

warkolm · May 28, 2020, 12:20am

Can you post your entire Elasticsearch log? Use gist/pastebin/etc if you need to

Lisahtwy · May 28, 2020, 12:30am

gist.github.com

https://gist.github.com/Lisahtwy/f7ee7813cf2bf332b80ef9b408402a72

elasticsearch crash logs

[2020-05-27T00:09:59,821][INFO ][o.e.m.j.JvmGcMonitorService] [li393-89.members.linode.com] [gc][973189] overhead, spent [355ms] collecting in the last [1s]
[2020-05-27T00:46:02,142][DEBUG][o.e.a.s.m.TransportMasterNodeAction] [li393-89.members.linode.com] Get stats for datafeed '_all'
[2020-05-27T01:30:00,005][INFO ][o.e.x.s.SnapshotRetentionTask] [li393-89.members.linode.com] starting SLM retention snapshot cleanup task
[2020-05-27T01:30:00,034][INFO ][o.e.x.m.MlDailyMaintenanceService] [li393-89.members.linode.com] triggering scheduled [ML] maintenance tasks
[2020-05-27T01:30:00,034][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [li393-89.members.linode.com] Deleting expired data
[2020-05-27T01:30:00,111][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [li393-89.members.linode.com] Completed deletion of expired ML data
[2020-05-27T01:30:00,112][INFO ][o.e.x.m.MlDailyMaintenanceService] [li393-89.members.linode.com] Successfully completed [ML] maintenance tasks
[2020-05-27T13:59:34,639][WARN ][o.e.m.j.JvmGcMonitorService] [li393-89.members.linode.com] [gc][young][1022545][5068] duration [12.9s], collections [1]/[4.1m], total [12.9s]/[3.8m], memory [2.1gb]->[1.9gb]/[3.9gb], all_pools {[young] [264.4mb]->[19.3mb]/[266.2mb]}{[survivor] [8.6mb]->[32.6mb]/[33.2mb]}{[old] [1.8gb]->[1.8gb]/[3.6gb]}
[2020-05-27T14:22:09,928][INFO ][o.e.e.NodeEnvironment    ] [li393-89.members.linode.com] using [1] data paths, mounts [[/ (/dev/root)]], net usable_space [87.4gb], net total_space [157gb], types [ext4]
[2020-05-27T14:22:09,941][INFO ][o.e.e.NodeEnvironment    ] [li393-89.members.linode.com] heap size [3.9gb], compressed ordinary object pointers [true]

This file has been truncated. show original

please let me know if you are not able to access above link

-Lisa

warkolm · May 28, 2020, 12:41am

It looks like Elasticsearch was restarted at around 2020-05-27T14:22:09,928.

What's the output of _cat/nodes?v and _cat/allocation?v?

Lisahtwy · May 28, 2020, 12:54am

I just had another crash, i restarted the elasticsearch.service
This is what i got for
GET _cat/nodes?v

ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1            9          89  45    2.48    0.88     0.73 dilm      *      li393-89.members.linode.com

and for
GET _cat/allocation?v

shards disk.indices disk.used disk.avail disk.total disk.percent host      ip        node
    51      457.7mb    69.3gb     87.6gb      157gb           44 127.0.0.1 127.0.0.1 li393-89.members.linode.com
    48                                                                               UNASSIGNED

warkolm · May 28, 2020, 12:56am

Can you please post the logs for that recent activity?

Lisahtwy · May 28, 2020, 1:05am

gist.github.com

https://gist.github.com/Lisahtwy/39b70dbc0c5cf1bf7a19a629150a4072

elasticsearch crash logs - Recent - 2020-05-27 - 9:04PM

[2020-05-28T00:51:13,492][INFO ][o.e.e.NodeEnvironment    ] [li393-89.members.linode.com] using [1] data paths, mounts [[/ (/dev/root)]], net usable_space [87.6gb], net total_space [157gb], types [ext4]
[2020-05-28T00:51:13,524][INFO ][o.e.e.NodeEnvironment    ] [li393-89.members.linode.com] heap size [3.9gb], compressed ordinary object pointers [true]
[2020-05-28T00:51:14,046][INFO ][o.e.n.Node               ] [li393-89.members.linode.com] node name [li393-89.members.linode.com], node ID [LOrZIItxSaChVbGzf11YCQ], cluster name [elasticsearch]
[2020-05-28T00:51:14,048][INFO ][o.e.n.Node               ] [li393-89.members.linode.com] version[7.5.1], pid[18209], build[default/rpm/3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96/2019-12-16T22:57:37.835892Z], OS[Linux/5.4.10-x86_64-linode132/amd64], JVM[AdoptOpenJDK/OpenJDK 64-Bit Server VM/13.0.1/13.0.1+9]
[2020-05-28T00:51:14,048][INFO ][o.e.n.Node               ] [li393-89.members.linode.com] JVM home [/usr/share/elasticsearch/jdk]
[2020-05-28T00:51:14,048][INFO ][o.e.n.Node               ] [li393-89.members.linode.com] JVM arguments [-Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=COMPAT, -Xms4g, -Xmx4g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Djava.io.tmpdir=/tmp/elasticsearch-12852497923521346942, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -XX:MaxDirectMemorySize=2147483648, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=default, -Des.distribution.type=rpm, -Des.bundled_jdk=true]
[2020-05-28T00:51:18,913][INFO ][o.e.p.PluginsService     ] [li393-89.members.linode.com] loaded module [aggs-matrix-stats]
[2020-05-28T00:51:18,914][INFO ][o.e.p.PluginsService     ] [li393-89.members.linode.com] loaded module [analysis-common]
[2020-05-28T00:51:18,914][INFO ][o.e.p.PluginsService     ] [li393-89.members.linode.com] loaded module [flattened]
[2020-05-28T00:51:18,914][INFO ][o.e.p.PluginsService     ] [li393-89.members.linode.com] loaded module [frozen-indices]

This file has been truncated. show original

warkolm · May 28, 2020, 1:06am

Thanks for sticking with this. Does that include the logs prior to the crash? They might explain what happened.

Lisahtwy · May 28, 2020, 1:09am

These are recent logs. elasticseach.log for May 28th started i think. May 27th logs are in .gz file. let me create another gist for those and share it with you

warkolm · May 28, 2020, 1:10am

Is the 2020-05-28T00:51:13,492 timestamp when you restarted things?

Lisahtwy · May 28, 2020, 1:12am

Ok here are my console logs when checking the status after crash and restarted.

checking the status here:

● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendo                                                                                                             r preset: disabled)
   Active: failed (Result: signal) since Thu 2020-05-28 00:25:08 UTC; 25min ago
     Docs: http://www.elastic.co
  Process: 11629 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_D                                                                                                             IR}/elasticsearch.pid --quiet (code=killed, signal=KILL)
 Main PID: 11629 (code=killed, signal=KILL)
   CGroup: /system.slice/elasticsearch.service

May 27 23:13:12 li393-89.members.linode.com systemd[1]: Starting Elasticsearc...
May 27 23:13:13 li393-89.members.linode.com elasticsearch[11629]: OpenJDK 64-...
May 27 23:13:41 li393-89.members.linode.com systemd[1]: Started Elasticsearch.
May 28 00:25:08 li393-89.members.linode.com systemd[1]: elasticsearch.service...
May 28 00:25:08 li393-89.members.linode.com systemd[1]: Unit elasticsearch.se...
May 28 00:25:08 li393-89.members.linode.com systemd[1]: elasticsearch.service...
Hint: Some lines were ellipsized, use -l to show in full.

restarted here:

● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendo                                                                                                             r preset: disabled)
   Active: active (running) since Thu 2020-05-28 00:51:32 UTC; 18s ago
     Docs: http://www.elastic.co
 Main PID: 18209 (java)
   CGroup: /system.slice/elasticsearch.service
           ├─18209 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress....
           └─18312 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-...

May 28 00:51:04 li393-89.members.linode.com systemd[1]: Starting Elasticsearc...
May 28 00:51:05 li393-89.members.linode.com elasticsearch[18209]: OpenJDK 64-...
May 28 00:51:32 li393-89.members.linode.com systemd[1]: Started Elasticsearch.
Hint: Some lines were ellipsized, use -l to show in full.

Lisahtwy · May 28, 2020, 1:14am

The first link contains all logs even before crash except the recent crash.
if you want the day before one, i will share it with you

warkolm · May 28, 2020, 1:35am

It's weird, I can't seem to see why it's crashing. It simply looks like it's being restarted, as per these lines;

gist.github.com

https://gist.github.com/Lisahtwy/f7ee7813cf2bf332b80ef9b408402a72#file-elasticsearch-crash-logs-L5464-L5465

elasticsearch crash logs

[2020-05-27T00:09:59,821][INFO ][o.e.m.j.JvmGcMonitorService] [li393-89.members.linode.com] [gc][973189] overhead, spent [355ms] collecting in the last [1s]
[2020-05-27T00:46:02,142][DEBUG][o.e.a.s.m.TransportMasterNodeAction] [li393-89.members.linode.com] Get stats for datafeed '_all'
[2020-05-27T01:30:00,005][INFO ][o.e.x.s.SnapshotRetentionTask] [li393-89.members.linode.com] starting SLM retention snapshot cleanup task
[2020-05-27T01:30:00,034][INFO ][o.e.x.m.MlDailyMaintenanceService] [li393-89.members.linode.com] triggering scheduled [ML] maintenance tasks
[2020-05-27T01:30:00,034][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [li393-89.members.linode.com] Deleting expired data
[2020-05-27T01:30:00,111][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [li393-89.members.linode.com] Completed deletion of expired ML data
[2020-05-27T01:30:00,112][INFO ][o.e.x.m.MlDailyMaintenanceService] [li393-89.members.linode.com] Successfully completed [ML] maintenance tasks
[2020-05-27T13:59:34,639][WARN ][o.e.m.j.JvmGcMonitorService] [li393-89.members.linode.com] [gc][young][1022545][5068] duration [12.9s], collections [1]/[4.1m], total [12.9s]/[3.8m], memory [2.1gb]->[1.9gb]/[3.9gb], all_pools {[young] [264.4mb]->[19.3mb]/[266.2mb]}{[survivor] [8.6mb]->[32.6mb]/[33.2mb]}{[old] [1.8gb]->[1.8gb]/[3.6gb]}
[2020-05-27T14:22:09,928][INFO ][o.e.e.NodeEnvironment    ] [li393-89.members.linode.com] using [1] data paths, mounts [[/ (/dev/root)]], net usable_space [87.4gb], net total_space [157gb], types [ext4]
[2020-05-27T14:22:09,941][INFO ][o.e.e.NodeEnvironment    ] [li393-89.members.linode.com] heap size [3.9gb], compressed ordinary object pointers [true]

This file has been truncated. show original

Lisahtwy · May 28, 2020, 2:06am

oh no..
How can I reset?
It was working yesterday, did not introduce new changes also.
What about the clusterblock exception? I thought that was the culprit

warkolm · May 28, 2020, 2:27am

There's nothing in the logs to suggest a crash that I can see. Just that the process is restarted for some reason. The cluster block is a result of the restart, as it's trying to recover the shards and certain ones are not yet available for Kibana.

The question I have is why are you restarting it, what makes you think it's crashed?

Lisahtwy · May 28, 2020, 3:49am

While I am creating an index with custom analyzer, I saw the below exception in my application logs.

ConnectionError(<urllib3.connection.HTTPConnection object at 0x7feaf2a65e10>: Failed to establish a new connection:
 [Errno 111] Connection refused) caused by:
 NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7feaf2a65e10>:
 Failed to establish a new connection: [Errno 111] Connection refused)

It says failed to establish a new connection, the first thing that came to my mind is, elasticsearch is running or not.
When I checked the status, it displays failed state.
So thats why I restarted the elasticsearch service

warkolm · May 28, 2020, 5:05am

Is that using the service command? You should really be checking Elasticsearch logs and the APIs.
Do you have Monitoring enabled?

Lisahtwy · May 28, 2020, 5:12am

Yes, sudo systemctl status elasticsearch.service
I didnt enable any monitoring.

dadoonet · May 28, 2020, 9:56am

I suspect that 4gb of HEAP might be a bit too small when it comes with FSCrawler bulk requests.
You can try to reduce the size of the bulk by modifying the bulk settings.

Lisahtwy · May 28, 2020, 11:52am

Hi,

I have been trying with bulksize = 5, do you want me to lower than that?

One more wild guess here, While I am creating index with custom analyzer.Below is what I am doing.

es = connections.create_connection(hosts=['localhost'])

I have seen some posts here caused by connection creation without port=9200. Is this the possible culprit?
I am running my scripts to create indexes with

from elasticsearch_dsl import Index,Search,Document, analyzer, tokenizer,\
                           connections,Mapping, Nested, Text, Keyword, InnerDoc

es = connections.create_connection(hosts=['localhost'], port = 9200)

Just let me know is this correct way to give port number or not?

-Lisa

Topic		Replies	Views
Connection Failure Elasticsearch Elasticsearch	1	431	July 10, 2019
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized] Elasticsearch	4	1031	August 17, 2022
Java client unable to connect - ClusterBlockException Elasticsearch	4	2925	July 6, 2017
ClusterBlockException while starting ElasticSearch Elasticsearch	1	1221	June 1, 2017
Fscrawler Elasticsearch docker	15	2182	February 24, 2021

Elasticsearch crashes with - org.elasticsearch.cluster.block.ClusterBlockException

Related topics