Just upgraded to kibana 6.8.1 and now kibana fails to launch with "kibana server is not ready yet"

Hi elk friends,

I just tried upgrading my kibana installation, and upgraded to logstash, kibana, elasticsearch version 6.8.1

now that I have upgraded, I am getting the page "kibana server is not ready yet"

I am fairly new to asking for support and going through the forums, so apologies in advance if i left any information out. is there anyone who may be faimiliar with this condition after upgrading to 6.8.1?

[root@ca-sna-elk02 ~]# systemctl -l status kibana
● kibana.service - Kibana
Loaded: loaded (/etc/systemd/system/kibana.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2019-07-05 23:19:29 PDT; 7h ago
Main PID: 21423 (node)
CGroup: /system.slice/kibana.service
└─21423 /usr/share/kibana/bin/../node/bin/node --no-warnings --max-http-header-size=65536 /usr/share/kibana/bin/../src/cli -c /etc/kibana/kibana.yml

Jul 06 06:45:44 ca-sna-elk02.mercurytechnology.com kibana[21423]: {"type":"log","@timestamp":"2019-07-06T13:45:44Z","tags":["error","task_manager"],"pid":21423,"message":"Failed to poll for work: [search_phase_execution_exception] all shards failed :: {"path":"/.kibana_task_manager/_doc/_search","query":{"ignore_unavailable":true},"body":"{\"query\":{\"bool\":{\"must\":[{\"term\":{\"type\":\"task\"}},{\"bool\":{\"must\":[{\"terms\":{\"task.taskType\":[\"maps_telemetry\",\"vis_telemetry\"]}},{\"range\":{\"task.attempts\":{\"lte\":3}}},{\"range\":{\"task.runAt\":{\"lte\":\"now\"}}},{\"range\":{\"kibana.apiVersion\":{\"lte\":1}}}]}}]}},\"size\":10,\"sort\":{\"task.runAt\":{\"order\":\"asc\"}},\"seq_no_primary_term\":true}","statusCode":503,"response":"{\"error\":{\"root_cause\":,\"type\":\"search_phase_execution_exception\",\"reason\":\"all shards failed\",\"phase\":\"query\",\"grouped\":true,\"failed_shards\":},\"status\":503}"}"}
[root@ca-sna-elk02 ~]#

Hello @Shawhin

From which version did you upgrade?

Did you follow the following documentation?https://www.elastic.co/guide/en/kibana/6.8/upgrade.html

edit: corrected the link

It's elasticsearch running right? Can you check if there are errors, stack traces or something in elasticsearch?

elasticsearch is certainly running, here is what i see with a systemctl status

how can i check for a stack trace?

[root@ca-sna-elk02 bin]# systemctl -l status elasticsearch
● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2019-07-05 23:11:42 PDT; 7h ago
Docs: http://www.elastic.co
Main PID: 20608 (java)
CGroup: /system.slice/elasticsearch.service
├─20608 /bin/java -Xms4g -Xmx4g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.io.tmpdir=/tmp/elasticsearch-1197803329153771373 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/elasticsearch/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=64m -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch -Des.distribution.flavor=default -Des.distribution.type=rpm -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet
└─20731 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Jul 05 23:11:42 ca-sna-elk02.mercurytechnology.com systemd[1]: Started Elasticsearch.
[root@ca-sna-elk02 bin]#

The path changes depending the install type, here you have the documentation with all default paths https://www.elastic.co/guide/en/beats/filebeat/6.8/directory-layout.html

i checked the elastic-search logs, and i see a lot of the same repeats that look similar to the below but nothing that stands out in terms of a stacktrace

: 279079K->3529K(306688K), 0.0294922 secs] 2955564K->2680034K(4160256K), 0.0297563 secs] [Times: user=0.10 sys=0.00, real=0.03 secs]
2019-07-06T07:27:49.193-0700: 29765.209: Total time for which application threads were stopped: 0.0305303 seconds, Stopping threads took: 0.0001482 seconds
2019-07-06T07:27:49.196-0700: 29765.213: Total time for which application threads were stopped: 0.0022859 seconds, Stopping threads took: 0.0018357 seconds
2019-07-06T07:27:49.197-0700: 29765.213: Total time for which application threads were stopped: 0.0003249 seconds, Stopping threads took: 0.0000867 seconds
2019-07-06T07:28:06.515-0700: 29782.532: Total time for which application threads were stopped: 0.0006119 seconds, Stopping threads took: 0.0001356 seconds
2019-07-06T07:28:06.516-0700: 29782.532: Total time for which application threads were stopped: 0.0003234 seconds, Stopping threads took: 0.0000784 seconds
2019-07-06T07:28:06.516-0700: 29782.533: Total time for which application threads were stopped: 0.0003150 seconds, Stopping threads took: 0.0000751 seconds
2019-07-06T07:28:16.473-0700: 29792.490: Total time for which application threads were stopped: 0.0006392 seconds, Stopping threads took: 0.0001461 seconds
2019-07-06T07:28:16.474-0700: 29792.490: Total time for which application threads were stopped: 0.0004263 seconds, Stopping threads took: 0.0001250 seconds
2019-07-06T07:28:16.475-0700: 29792.491: Total time for which application threads were stopped: 0.0003651 seconds, Stopping threads took: 0.0000811 seconds
2019-07-06T07:28:28.854-0700: 29804.870: Total time for which application threads were stopped: 0.0005257 seconds, Stopping threads took: 0.0001341 seconds
2019-07-06T07:28:30.301-0700: 29806.318: Total time for which application threads were stopped: 0.0006203 seconds, Stopping threads took: 0.0001552 seconds
2019-07-06T07:28:30.302-0700: 29806.318: Total time for which application threads were stopped: 0.0003215 seconds, Stopping threads took: 0.0000768 seconds
2019-07-06T07:28:30.302-0700: 29806.319: Total time for which application threads were stopped: 0.0003576 seconds, Stopping threads took: 0.0000750 seconds
2019-07-06T07:29:10.909-0700: 29846.925: [GC (Allocation Failure) 2019-07-06T07:29:10.909-0700: 29846.925: [ParNew
Desired survivor size 17432576 bytes, new threshold 6 (max 6)

  • age 1: 3828008 bytes, 3828008 total
  • age 2: 204456 bytes, 4032464 total
  • age 3: 242632 bytes, 4275096 total
  • age 4: 10304 bytes, 4285400 total
  • age 5: 17568 bytes, 4302968 total
  • age 6: 15640 bytes, 4318608 total
    : 275856K->5009K(306688K), 0.0228942 secs] 2952361K->2681522K(4160256K), 0.0231742 secs] [Times: user=0.08 sys=0.00, real=0.02 secs]

That seems to be the gc log can you show me the listing of the log directory?

absolutely, just the current elastic search log dir?

[root@ca-sna-elk02 elasticsearch]# pwd
/var/log/elasticsearch
[root@ca-sna-elk02 elasticsearch]# ls -ltrh
total 949M
-rw-r--r-- 1 elasticsearch elasticsearch 65M May 6 07:10 gc.log.0
-rw-r--r-- 1 elasticsearch elasticsearch 65M May 12 11:10 gc.log.1
-rw-r--r-- 1 elasticsearch elasticsearch 65M May 18 12:22 gc.log.2
-rw-r--r-- 1 elasticsearch elasticsearch 65M May 24 14:53 gc.log.3
-rw-r--r-- 1 elasticsearch elasticsearch 65M May 30 23:20 gc.log.4
-rw-r--r-- 1 elasticsearch elasticsearch 65M Jun 5 19:33 gc.log.5
-rw-r--r-- 1 elasticsearch elasticsearch 65M Jun 9 05:13 gc.log.6
-rw-r--r-- 1 elasticsearch elasticsearch 65M Jun 12 07:27 gc.log.7
-rw-r--r-- 1 elasticsearch elasticsearch 65M Jun 15 08:03 gc.log.8
-rw-r--r-- 1 elasticsearch elasticsearch 65M Jun 18 10:23 gc.log.9
-rw-r--r-- 1 elasticsearch elasticsearch 65M Jun 21 12:10 gc.log.10
-rw-r--r-- 1 elasticsearch elasticsearch 65M Jun 24 13:03 gc.log.11
-rw-r--r-- 1 elasticsearch elasticsearch 65M Jun 27 16:27 gc.log.12
-rw-r--r-- 1 elasticsearch elasticsearch 65M Jun 30 21:35 gc.log.13
-rw-r--r-- 1 elasticsearch elasticsearch 53M Jul 3 11:42 gc.log.14.current
-rw-r--r-- 1 elasticsearch elasticsearch 197K Jul 6 07:42 gc.log.0.current
[root@ca-sna-elk02 elasticsearch]#

well seems that your elasticsearch logs are missing, can you check path.logs configuration on elasticsearch.yml and check if there are others logs somewhere?

If not try to use "sudo journalctl -xe -u <elasticsearch service name>

it looks like the log path is in a diff location, i do have the main log now and certainly see some errors. it is a 41mb file though. it looks there are so many errors i am not so sure where to even start

Caused by: java.nio.file.FileSystemException: /u01/elasticsearch/nodes/0/indices/LH43SX3dTSqUN0iszHdBnQ/0/translog/translog-155.ckp: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_212]
at java.nio.file.Files.newByteChannel(Files.java:407) ~[?:1.8.0_212]
at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:77) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.elasticsearch.index.translog.Checkpoint.read(Checkpoint.java:188) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.translog.Translog.recoverFromFiles(Translog.java:243) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.translog.Translog.(Translog.java:189) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.engine.InternalEngine.openTranslog(InternalEngine.java:481) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:196) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:168) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1494) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1444) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:440) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1737) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2390) ~[elasticsearch-6.8.1.jar:6.8.1]
... 4 more
[2019-07-06T08:06:58,137][WARN ][o.e.i.c.IndicesClusterStateService] [elk02] [[wazuh-alerts-3.x-2018.03.19][0]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [wazuh-alerts-3.x-2018.03.19][0]: Recovery failed on {elk02}{DVhqmG-pStKCiiRaEO0Y_w}{sFw1Y8eyTDuY9AiodRM2_w}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=25008992256, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2394) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-6.8.1.jar:6.8.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recover from gateway
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:445) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1737) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2390) ~[elasticsearch-6.8.1.jar:6.8.1]
... 4 more
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: failed to create engine
at org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:208) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:168) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1494) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1444) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:440) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1737) ~[elasticsearch-6.8.1.jar:6.8.1]
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2390) ~[elasticsearch-6.8.1.jar:6.8.1]
... 4 more
Caused by: java.nio.file.FileSystemException: /u01/elasticsearch/nodes/0/indices/vpMoMx1TTf6SI2psS80o8A/0/translog/translog-12.ckp: Too many open files

Good! seems that we have a target now! :smiley:

Can you please check the max open files? login as the user which uses the elasticsearch service and use the following command:

ulimit -Hn

What OS are you using?

seems that the user have reached the max open files and the kernel doesn't allow it to open the files it needs, so elasticsearch cannot open the necessary files in order to open and recover the indices.

I add the following documentation link about this in the elasticsearch install process:

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/file-descriptors.html

this is using redhat 7 and it looks like ulimit is 4096

Well then please as in the documentation https://www.elastic.co/guide/en/elasticsearch/reference/6.8/file-descriptors.html set it at least to 65535, in order to make changes permanent I would add it to the file. :slight_smile:

You also can add the following line in the elasticsearch.service

LimitNOFILE=65536

It will set the max file descriptor to that number when you start elasticsearch.

thank you- i checked in the service file, and i did see this directive in place already

Sets the default values for elasticsearch variables used in this script

ES_HOME="/usr/share/elasticsearch"
MAX_OPEN_FILES=65535
MAX_MAP_COUNT=262144
ES_PATH_CONF="/etc/elasticsearch"

should i also add the LimitNOFILE=65536 directive?

Yes please :slight_smile: add it and let's see if now works.

In case it not works please check your selinux configuration with the following command:

getenforce

Also remember to restart elasticsearch.

thank you, i have followed those steps and restarted the box still seeing the same results with the page unfortunately "Kibana server is not ready yet"

as for getenforce, its disabled

Can you check again the elasticsearch logs and see if there are more errors?

absolutely- i actually brought all the logs into pastebin from the elasticsearch startup point until now

https://pastebin.com/s7zKXrA6

i am seeing some errors that look like shards failed but im not sure how critical that is or what to make of it