Monitoring Exceptions after upgrade 5.4.0 to 5.6.8 to 6.2.3

I have upgraded elastic-search stack from 5.4.0 to 5.6.8 to 6.2.3 on centos 7 via RPM deployments, with a single node cluster for test purposes.
I used the x-pack Upgrade Assistant to re-index certain indices it suggested (.kibana, .security, .watches and .triggered_watchs). It didn't prompt me to re-index any of the monitoring indices.

After the upgrade I get the following written to the elastic-search log file:

[2018-04-09T15:12:54,445][DEBUG][o.e.a.s.TransportSearchAction] [zMOSE-6] [.monitoring-es-6-2018.04.04][0], node[zMOSE-6fQ3m56HUYL5_lZA], [P], s[STARTED], a[id=ZJsH-gVDSQyHHE2ygOcT1A]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[.monitoring-es-2-2018.04.03, .monitoring-es-6-2018.04.08, .monitoring-es-6-2018.04.09, .monitoring-es-6-2018.04.06, .monitoring-es-6-2018.04.07, .monitoring-es-6-2018.04.04, .monitoring-es-6-2018.04.05, .monitoring-es-2-2018.04.04], indicesOptions=IndicesOptions[id=7, ignore_unavailable=true, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=false, ignore_aliases=false], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=5, batchedReduceSize=512, preFilterShardSize=128, source={"size":2,"query":{"bool":{"filter":[{"term":{"cluster_uuid":{"value":"FMh3_a7WTgeAX0heD0aKrA","boost":1.0}}},{"term":{"type":{"value":"cluster_stats","boost":1.0}}},{"range":{"timestamp":{"from":"now-2d","to":null,"include_lower":true,"include_upper":true,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":["cluster_state.nodes_hash","cluster_state.nodes.*.name","cluster_state.nodes.*.ephemeral_id"],"excludes":[]},"sort":[{"timestamp":{"order":"desc"}}],"collapse":{"field":"cluster_state.nodes_hash"}}}] lastShard [true]
org.elasticsearch.transport.RemoteTransportException: [zMOSE-6][172.18.0.1:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchContextException: no mapping found for `cluster_state.nodes_hash` in order to collapse on

As part of the upgrade I had to introduce TLS to the cluster, but have attempted to leave access as HTTP for the moment, as all clients are trusted and reside behind the firewall. Some configuration from elasticsearch.yml pertaining to monitoring:

# Enforcing TLS based traffic over the transport protocol.
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path : /etc/elasticsearch/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path : /etc/elasticsearch/elastic-certificates.p12

# Not enforcing TLS based traffic over the HTTP protocol. Shall use an anonymous user.
xpack.security.http.ssl.enabled: false

################
## Monitoring ##
################
xpack.monitoring.exporters.my_local:
  type: local
  use_ingest: false
  cluster_alerts.management.enabled: true

############################
### Watcher Configuration ##
############################
xpack.watcher.enabled: true
xpack.watcher.history.cleaner_service.enabled: true
xpack.http.ssl.truststore.path: "/etc/elasticsearch/policydemo-truststore.jks"
xpack.http.ssl.truststore.password: <REDACT>

Also, here is a snapshot list of the templates and aliases:

[tango@iel-dev-mtn-vm2 ~]$ !914
curl -uelastic:t3l3com -XGET 'localhost:9200/_cat/aliases?v&pretty'
alias              index                filter routing.index routing.search
.security          .security-6          -      -             -
.watches           .watches-6           -      -             -
.kibana            .kibana-6            -      -             -
.triggered_watches .triggered_watches-6 -      -             -
[tango@iel-dev-mtn-vm2 ~]$ curl -uelastic:t3l3com -XGET 'localhost:9200/_cat/templates?v&pretty'
name                          index_patterns             order      version
.triggered_watches            [.triggered_watches*]      2147483647 
security-index-template-v6    [.security-*]              1000       
.monitoring-es                [.monitoring-es-6-*]       0          6020099
kibana_index_template:.kibana [.kibana]                  0          
.watch-history-6              [.watcher-history-6*]      2147483647 
logstash-index-template       [.logstash]                0          
.ml-anomalies-                [.ml-anomalies-*]          0          6020399
.watches                      [.watches*]                2147483647 
.monitoring-alerts            [.monitoring-alerts-6]     0          6020099
.monitoring-kibana            [.monitoring-kibana-6-*]   0          6020099
security_audit_log            [.security_audit_log*]     2147483647 
.ml-meta                      [.ml-meta]                 0          6020399
.monitoring-beats             [.monitoring-beats-6-*]    0          6020099
.ml-state                     [.ml-state]                0          6020399
.monitoring-logstash          [.monitoring-logstash-6-*] 0          6020099
security-index-template       [.security-*]              1000       
.watch-history-7              [.watcher-history-7*]      2147483647 
.ml-notifications             [.ml-notifications]        0          6020399
[tango@iel-dev-mtn-vm2 ~]$ 

Any guidance on this issue would be much appreciated.

Also, I forgot to include some of the problematic indices:

[tango@iel-dev-mtn-vm2 ~]$ curl -uelastic:t3l3com -XGET 'localhost:9200/_cat/indices?v&pretty' | grep "monitoring-es"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5031  100  5031    0     0  12900      0 --:--:-- --:--:-- --:--:-- 12933
yellow open   .monitoring-es-6-2018.04.05       1BOt0XRYTTWsSM_PlzlinQ   1   1     477346         7741    297.9mb        297.9mb
yellow open   .monitoring-es-6-2018.04.04       rDZdEgQ0SJ-LVxfIgwYtWQ   1   1     413412        14000    252.5mb        252.5mb
yellow open   .monitoring-es-2-2018.04.03       tK1BFyK6Re2fh_BTmIyFUQ   1   1     726277        14762    334.2mb        334.2mb
green  open   .monitoring-es-6-2018.04.07       nHDLvMx0RuKE6vvrOSPB-Q   1   0     293970          600    157.5mb        157.5mb
yellow open   .monitoring-es-2-2018.04.04       PkFaBFkZRaOR6heWzddwbQ   1   1     533415        22272    265.1mb        265.1mb
green  open   .monitoring-es-6-2018.04.09       Sv6ce650Sxu1B_U1Ur4oKA   1   0     119359          578     70.3mb         70.3mb
green  open   .monitoring-es-6-2018.04.06       AzIMmEx9QmGrQbhqynjPLA   1   0     139247          216     73.6mb         73.6mb
green  open   .monitoring-es-6-2018.04.08       tA1CwO1aTMaDGmnKpyMRww   1   0     328579          864    178.1mb        178.1mb

Potentially the solution is just to delete the problematic indices. This is an option for me as part of the upgrade if required, but was worried that perhaps this issue was indicating that I'd made a bigger mistake as part of the upgrade.

Hi @jhn134910,

This is unrelated to your TLS configuration. We added a new field (cluster_state.nodes_hash as the error notes) for the nodes added / removed / changed cluster alert. This allows us to detect when the nodes list has changed in the cluster state (like any decent hash, it changes whenever a node is restarted, added, or removed).

You have three options really:

  1. Ignore the error. It only impacts the cluster alert and it doesn't even stop it from working; it's just failing on older shards, which are irrelevant and ignored since they didn't have the field. In a few days as the indices are curated, the problem will take care of itself.
  2. Delete indices created before the 6.x upgrade.
    • Personally, I recommend just deleting the 5.4.x indices (DELETE .monitoring-es-2-*).
  3. Add the cluster_state.nodes_hash field to existing mappings.

The last one can be done to the 5.6.x-based index mappings by (this will be ignored by indices that already have the field):

PUT /.monitoring-es-6-*/_mapping/doc
{
  "doc": {
    "properties": {
      "cluster_state": {
        "properties": {
          "nodes_hash": {
            "type": "integer"
          }
        }
      }
    }
  }
}

Hope that helps,
Chris

This is very helpful Chris, thanks for your guidance.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.