Hi everyone,
We have an elastic instance that we feed data from CI jobs.
It is single node instance.
Maybe some data from the cluster:
"cluster_name" : "docker-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 998,
"active_shards" : 998,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 931,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 51.73665111456713
Sometimes we see elastic docker restart.
Today I managed to find in the logs the possible culprit, but I can't seem to understand exactly what went wrong.
Here is the stack trace:
2022-09-16T13:51:41.553888557Z {"type": "server", "timestamp": "2022-09-16T13:51:41,553Z", "level": "INFO", "component": "o.e.x.i.IndexLifecycleTransition", "cluster.name": "docker-cluster", "node.name": "elasticsearch", "message": "moving index [test-report-index-000047] from [{\"phase\":\"hot\",\"action\":\"rollover\",\"name\":\"update-rollover-lifecycle-date\"}] to [{\"phase\":\"hot\",\"action\":\"rollover\",\"name\":\"set-indexing-complete\"}] in policy [lifetime]", "cluster.uuid": "go2F4QctRnuC0TnMLmhfSg", "node.id": "gI3QYOf5S4ma7Cz75F79rA" }
2022-09-16T13:51:41.916359627Z {"type": "server", "timestamp": "2022-09-16T13:51:41,915Z", "level": "INFO", "component": "o.e.x.i.IndexLifecycleTransition", "cluster.name": "docker-cluster", "node.name": "elasticsearch", "message": "moving index [test-report-index-000048] from [{\"phase\":\"hot\",\"action\":\"set_priority\",\"name\":\"set_priority\"}] to [{\"phase\":\"hot\",\"action\":\"unfollow\",\"name\":\"wait-for-indexing-complete\"}] in policy [lifetime]", "cluster.uuid": "go2F4QctRnuC0TnMLmhfSg", "node.id": "gI3QYOf5S4ma7Cz75F79rA" }
2022-09-16T13:51:42.014259216Z {"type": "server", "timestamp": "2022-09-16T13:51:42,013Z", "level": "INFO", "component": "o.e.x.i.IndexLifecycleTransition", "cluster.name": "docker-cluster", "node.name": "elasticsearch", "message": "moving index [test-report-index-000047] from [{\"phase\":\"hot\",\"action\":\"rollover\",\"name\":\"set-indexing-complete\"}] to [{\"phase\":\"hot\",\"action\":\"complete\",\"name\":\"complete\"}] in policy [lifetime]", "cluster.uuid": "go2F4QctRnuC0TnMLmhfSg", "node.id": "gI3QYOf5S4ma7Cz75F79rA" }
2022-09-16T13:51:42.360122383Z {"type": "server", "timestamp": "2022-09-16T13:51:42,359Z", "level": "INFO", "component": "o.e.x.i.IndexLifecycleTransition", "cluster.name": "docker-cluster", "node.name": "elasticsearch", "message": "moving index [test-report-index-000048] from [{\"phase\":\"hot\",\"action\":\"unfollow\",\"name\":\"wait-for-indexing-complete\"}] to [{\"phase\":\"hot\",\"action\":\"unfollow\",\"name\":\"wait-for-follow-shard-tasks\"}] in policy [lifetime]", "cluster.uuid": "go2F4QctRnuC0TnMLmhfSg", "node.id": "gI3QYOf5S4ma7Cz75F79rA" }
2022-09-16T13:58:23.057165197Z {"type": "server", "timestamp": "2022-09-16T13:58:23,054Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "docker-cluster", "node.name": "elasticsearch", "message": "fatal error in thread [elasticsearch[elasticsearch][search][T#24]], exiting", "cluster.uuid": "go2F4QctRnuC0TnMLmhfSg", "node.id": "gI3QYOf5S4ma7Cz75F79rA" ,
2022-09-16T13:58:23.057217798Z "stacktrace": ["java.lang.AssertionError: unexpected higher total ops [41] compared to expected [40]",
2022-09-16T13:58:23.057221480Z "at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:403) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057224203Z "at org.elasticsearch.action.search.AbstractSearchAsyncAction.access$100(AbstractSearchAsyncAction.java:70) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057226443Z "at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:258) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057228695Z "at org.elasticsearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:73) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057231077Z "at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057233348Z "at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:408) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057235517Z "at org.elasticsearch.transport.TransportService$6.handleException(TransportService.java:640) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057237557Z "at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1181) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057239739Z "at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1290) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057241780Z "at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1251) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057244677Z "at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1229) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057260466Z "at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:52) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057262610Z "at org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:43) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057264825Z "at org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:27) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057267483Z "at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057269903Z "at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057271869Z "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057274418Z "at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057277553Z "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057279401Z "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.1.jar:7.10.1]",
2022-09-16T13:58:23.057282028Z "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]",
2022-09-16T13:58:23.057283931Z "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]",
2022-09-16T13:58:23.057285936Z "at java.lang.Thread.run(Thread.java:832) [?:?]",
2022-09-16T13:58:23.057287760Z "Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: Shard failures",
What jumps to me is the following:
Java.lang.AssertionError: unexpected higher total ops [41] compared to expected [40]
Is it because Elasticsearch was overwhelmed with operations?
It seems it was doing some house cleaning with index life cycle management.
Thanks