Shards Failed | Most of the recent indexes are unassigned

I have a server with ELK along with heartbeat installed. (all are v6.4.2)

Heartbeat is monitoring two other servers' elasticsearch and logstash with its pipeline in same domainfrom current server.
It is creating index like heartbeat-6.4.2-<date> everyday. I created dashboard for those two servers and was working as expected.

It was working fine until 12th Oct.

Yesterday I tried to see the dashboard for last 24 hrs and its giving me the following error.

10 of 62 Shards failed

Then I tried to check the health of my Elasticsearch:

// 20191017122950
// http://<hostname>:9202/_cluster/health

{
  "cluster_name": "elasticsearch",
  "status": "yellow",
  "timed_out": false,
  "number_of_nodes": 1,
  "number_of_data_nodes": 1,
  "active_primary_shards": 195,
  "active_shards": 195,
  "relocating_shards": 0,
  "initializing_shards": 0,
 "unassigned_shards": 194,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 50.128534704370175
}

Unassigned Shards count is 194

I don't know the exact reason and started digging deeper.

curl -XGET http://hostname:9202/_cluster/allocation/explain?pretty

{
  "index" : "mdcp-logs-2019.09.28",
  "shard" : 2,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2019-10-01T11:30:03.909Z",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "Bewr4jriQziexcfUXZSfdg",
      "node_name" : "Bewr4jr",
      "transport_address" : "ip:9300",
      "node_attributes" : {
        "ml.machine_memory" : "67368890368",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[mdcp-logs-2019.09.28][2], node[Bewr4jriQziexcfUXZSfdg], [P], s[STARTED], a[id=SbtMhgHzTMCQv6yEGcwV8Q]]"
        }
      ]
    }
  ]
}

I can see the current data in Discover tab but not storing in index and so dashboard is not working.

Any solution or workaround for this?

Hi @Sundaramoorthy_Anand

the problem you are experiencing is a result of only having a single node running while having those indices configured to use one replica per shard.
Since a replicate and primary have to be on separate nodes, your replicas can not be allocated and you're seeing a yellow state for those indices. You can fix the issue by configuring the indices that are yellow to use 0 replicas to get green indices with just a single node cluster.

This should not however preclude indexing of new data into those indices. To find out what's preventing new data from being indexed I would suggest looking into the logs for the ES and/or Logstash nodes that are trying to write their monitoring data to your node to see what errors they are experiencing when indexing.

2 Likes

Thanks !
I made replicas to zero and now I can see number of unallocated shards become zero now. But still I'm seeing 10 of 62 Shards failed error

@Sundaramoorthy_Anand no problem.

The shard failures are the result of whatever query Kibana sends to your ES nodes failing on some shards I think. Do your ES logs or the Kibana logs show any errors/warnings when you see that failure in Kibana?

Elasticsearch log:

 [2019-10-21T07:08:33,255][DEBUG][o.e.a.s.TransportSearchAction] [Bewr4jr] [heartbeat-6.4.2-2019.10.19][0], node[Bewr4jriQziexcfUXZSfdg], [P], s[STARTED], a[id=4OVH7mT5Qp-BCMHM-zRBUQ]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[heartbeat-6.4.2*], indicesOptions=IndicesOptions[ignore_unavailable=true, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false], types=[], routing='null', preference='1571641708008', requestCache=null, scroll=null, maxConcurrentShardRequests=5, batchedReduceSize=512, preFilterShardSize=32, allowPartialSearchResults=true, source={"size":0,"query":{"bool":{"must":[{"range":{"@timestamp":{"from":1571509800000,"to":1572114599999,"include_lower":true,"include_upper":true,"format":"epoch_millis","boost":1.0}}},{"match_phrase":{"http.url.raw":{"query":"http://<url>:9202/","slop":0,"zero_terms_query":"NONE","boost":1.0}}}],"filter":[{"match_all":{"boost":1.0}},{"match_all":{"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":[],"excludes":[]},"stored_fields":"*","docvalue_fields":[{"field":"@timestamp","format":"date_time"}],"script_fields":{},"aggregations":{"2":{"terms":{"field":"monitor.status","size":5,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}}}}] lastShard [true]
        org.elasticsearch.transport.RemoteTransportException: [Bewr4jr][<ip>:9300][indices:data/read/search[phase/query]]
        Caused by: java.lang.IllegalArgumentException: Fielddata is disabled on text fields by default. Set fielddata=true on [monitor.status] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.
                at org.elasticsearch.index.mapper.TextFieldMapper$TextFieldType.fielddataBuilder(TextFieldMapper.java:670) ~[elasticsearch-6.4.2.jar:6.4.2]
         at org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:115) ~[elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.index.query.QueryShardContext.getForField(QueryShardContext.java:166) ~[elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.aggregations.support.ValuesSourceConfig.resolve(ValuesSourceConfig.java:94) ~[elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.aggregations.support.ValuesSourceAggregationBuilder.resolveConfig(ValuesSourceAggregationBuilder.java:317) ~[elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.aggregations.support.ValuesSourceAggregationBuilder.doBuild(ValuesSourceAggregationBuilder.java:310) ~[elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.aggregations.support.ValuesSourceAggregationBuilder.doBuild(ValuesSourceAggregationBuilder.java:37) ~[elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.aggregations.AbstractAggregationBuilder.build(AbstractAggregationBuilder.java:139) ~[elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.aggregations.AggregatorFactories$Builder.build(AggregatorFactories.java:329) ~[elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.SearchService.parseSource(SearchService.java:766) ~[elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.SearchService.createContext(SearchService.java:575) ~[elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:551) ~[elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:347) ~[elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:333) [elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:329) [elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1019) [elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.4.2.jar:6.4.2]
            at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.4.2.jar:6.4.2]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201]
            at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]

@Sundaramoorthy_Anand

Looks like something went wrong with your index-templates/mappings there. Since you're saying it worked fine before Oct 12th, I'm assuming you did follow all the steps in https://www.elastic.co/guide/en/beats/heartbeat/6.4/load-kibana-dashboards.html to configure things correctly? Did anything change that may have broken your configured templates (like e.g. starting to use LS but not configuring things accordingly)?

heartbeat setup -e \
      -E output.logstash.enabled=false \
      -E output.elasticsearch.hosts=['localhost:9200'] \
      -E output.elasticsearch.username=heartbeat_internal \
      -E output.elasticsearch.password=YOUR_PASSWORD \
      -E setup.kibana.host=localhost:5601

Instead of doing this in command line, I did the same through heartbeat.yml

# Configure monitors
heartbeat.monitors:
- type: http

  # List or urls to query
  urls: [  <url list>  ]

  # Configure task schedule
  schedule: '@every 10s'
  timeout: 16s
#==================== Elasticsearch template setting ==========================

setup.template.settings:
  index.number_of_shards: 1
  index.codec: best_compression

setup.kibana:

  # Kibana Host
  host: "ip:5601"
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["ip:9202"]

The same file was working until Oct 12th and still it pushes the data to Discover tab. But it is not getting assigned to any index.

@Armin_Braun Is there any update on this? Thanks in advance.

What does GET _cat/allocation?v return?. Run this from Dev Tools in Kibana.

@sandeepkanabar This is what I get

After executing the PUT API for settings(what is in above pic), I got this:

Trying restarting the ES process in the other node and see if it joins the ES cluster properly. If it joins, then you can set the replicas back to 1.

There's nothing like pushing the data to Discover tab. Discover tab merely reads the data from heartbeat index in this case.

There is NO master slave nodes here.

Only one node present in our architecture. and so I made the replicas to ZERO as @Armin_Braun suggested

@Sundaramoorthy_Anand sorry for the delay here. Unfortunately, I'm at my wits end when it comes to this one. I would suggest asking for help in the Beats forums, it seems like this is rather a Beats than an ES issue at this point (the mappings simply don't fit with the queries that are being set up which isn't a problem with ES itself) I'm afraid.

okay @Armin_Braun

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.