Kibana X-pack Machine Learning Jobs list could not be retrieved

thirusama · January 29, 2018, 5:15pm

Versions being used ( all are installed on a single node).
logstash-6.0.1
elasticsearch-6.1.1
kibana-6.1.1-linux-x86_64

I am getting the error in Machine learning -> Job Management Page.
Error: Job list could not be retrieved
[search_phase_execution_exception] all shards failed

Even If I try to create new Job also, I get below error

When I try to get below stuff
GET /_xpack/usage
the output is

{
  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": []
  },
  "status": 503
}

Please let me know how I can debug this. I tried restarting all services, but did not help. Thanks

richcollier · January 29, 2018, 6:14pm

What errors are seen in elasticsearch.log when you try to save a job? I suspect the root cause of what's going on will be seen there...

thirusama · January 29, 2018, 6:34pm

This is what I am seeing in ealsticsearch.log when I am trying to save the Job.

[2018-01-29T11:31:31,389][WARN ][r.suppressed             ] path: /_xpack/ml/anomaly_detectors/testing2, params: {job_id=testing2}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:274) ~[elasticsearch
-6.1.1.jar:6.1.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:132) ~[elasticsear
ch-6.1.1.jar:6.1.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:243) ~[elasticsearch-6.
1.1.jar:6.1.1]
        at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:107) ~[elasticsearch-6.1.1.jar:6.1
.1]
        at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhaseOnShard$4(InitialSearchPhase.java:205) ~[elasticsearch
-6.1.1.jar:6.1.1]
        at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSearchPhase.java:184) ~[elasticsearch-6.1.1.jar:6.1.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elas
ticsearch-6.1.1.jar:6.1.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.1.jar:6.1.1]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.1.1.jar:6.1.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.1.jar:6.1.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2018-01-29T11:31:31,639][DEBUG][o.e.a.s.TransportSearchAction] [8QF1u_C] All shards failed for phase: [query]
[2018-01-29T11:31:31,641][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [8QF1u_C] collector [cluster_stats] failed to collect data
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:274) ~[elasticsearch
-6.1.1.jar:6.1.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:132) ~[elasticsear
ch-6.1.1.jar:6.1.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:243) ~[elasticsearch-6.
1.1.jar:6.1.1]

[2018-01-29T11:31:31,647][DEBUG][o.e.a.s.TransportSearchAction] [8QF1u_C] All shards failed for phase: [query]
[2018-01-29T11:31:31,831][DEBUG][o.e.a.s.TransportSearchAction] [8QF1u_C] All shards failed for phase: [query]
[2018-01-29T11:31:31,832][DEBUG][o.e.a.s.TransportSearchAction] [8QF1u_C] All shards failed for phase: [query]
[2018-01-29T11:31:31,831][ERROR][o.e.x.m.c.m.JobStatsCollector] [8QF1u_C] collector [job_stats] failed to collect data
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed

richcollier · January 29, 2018, 6:42pm

Hmm...this seems to me that something fundamental to elasticsearch is not working (not just ML). Are you actively using this cluster for other things or did you stand this up just to test ML?

What do you see when your run the following in Console (Dev Tools):?

GET _cluster/health

thirusama · January 29, 2018, 6:46pm

We mainly want to explore ML features, so built this node and installed all here. Regular Discover/Visualization, ingestion of data using logstash is working fine.

This is what I get when I run GET _cluster/health

{
  "cluster_name": "elasticsearch",
  "status": "red",
  "timed_out": false,
  "number_of_nodes": 1,
  "number_of_data_nodes": 1,
  "active_primary_shards": 824,
  "active_shards": 824,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 833,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 49.728424864212435
}

richcollier · January 29, 2018, 6:51pm

Glad to hear you're interested in ML!

Unfortunately, your cluster status is red and that will certainly hinder things...you'll need to figure out why that's the case before I can help. Perhaps restart elasticsearch and watch the logging for reasons why the cluster won't go into at least a yellow state.

thirusama · January 29, 2018, 6:54pm

Ok, Thanks for your response. I am doubting it could be because of no.of shards, as we have a single node. Any pointers to debug in that route? How can I delete all the data (and indices/shards) and start from scratch?

richcollier · January 29, 2018, 6:56pm

Yes you have a lot of shards, half of which are unassigned.

If you want to delete all the data and start again:

stop elasticsearch
rm -rf /path/to/elasticsearch/data
start elasticsearch

thirusama · January 29, 2018, 8:55pm

Thanks. For now this has fixed my issue.

warkolm · January 30, 2018, 12:22am

You can also use curl -XDELETE HOST:9200/* to remove all indices. It's a bit safer than removing things from the filesystem as well

thirusama · January 30, 2018, 4:38pm

Thanks @warkolm Good to know

system · February 27, 2018, 4:38pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to open Machine Learning job Elasticsearch elastic-stack-machine-learning	8	2008	October 29, 2018
Machine Learning in Kibana - Jobs list could not be retrieved Kibana	6	1147	August 4, 2017
Failed to poll for work Kibana	2	7386	July 22, 2019
Machine Learning in Kibana - Jobs List could not be retrieved at UI Kibana	2	686	July 3, 2017
Can not create Machine learning job in Kibana Kibana elastic-stack-machine-learning	2	556	October 5, 2020

Kibana X-pack Machine Learning Jobs list could not be retrieved

Related topics