Unable to restart elasticsearch

Hi,

I'm using elasticsearch in a multi-node cluster.

Here's the configuration file (elasticsearch.yml) from one of the two elasticsearch log master nodes (in the cluster I have also 6 elasticsearch log nodes):

# Network
network:
  host: 0.0.0.0
  publish_host: c4t19154.xxx.xxx
http.port: 9200

# Cluster / node name
cluster.name: "es-sa20-xxx-log-hdp-itg-h4"
node.name: "c4t19154.xxx.xxx"

# Node role
node.master: True
node.data: False

# Discovery
discovery.seed_hosts: ['c4t19156.xxx.xxx:9300', 'c4t19154.xxx.xxx:9300', 'c4t19145.xxx.xxx:9300', 'c4t19147.xxx.xxx:9300', 'c4t19867.xxx.xxx:9300', 'c4t19149.xxx.xxx:9300', 'c4t19150.xxx.xxx:9300', 'hc9t09679.xxx.xxx:9300']
cluster.initial_master_nodes: ['c4t19156.xxx.xxx', 'c4t19154.xxx.xxx']

# Memory - Performance
bootstrap.memory_lock: true

# Monitoring
xpack.monitoring.collection.enabled: true
xpack.monitoring.exporters:
  csa_monitoring:
    type: http
    host: ['https://c9t24539.xxx.xxx:9200', 'https://c9t24540.xxx.xxx:9200', 'https://c9t24541.xxx.xxx:9200']
    auth:
      username: "elastic"
      password: "changeme"
    ssl:
      certificate_authorities: [ "/usr/share/elasticsearch/config/XXX_ENT_Private_SSL_CA_bundle.crt" ]

# Security
xpack.security.enabled: true
xpack.security.audit.enabled: false

# TLS/SSL
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: "/usr/share/elasticsearch/config/sa-monitoring-itg-h4.xxx.xxx.key"
xpack.security.transport.ssl.certificate: "/usr/share/elasticsearch/config/sa-monitoring-itg-h4.xxx.xxx.crt"
xpack.security.transport.ssl.certificate_authorities: [ "/usr/share/elasticsearch/config/XXX_ENT_Private_SSL_CA_bundle.crt" ]

xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.key: "/usr/share/elasticsearch/config/sa-monitoring-itg-h4.xxx.xxx.key"
xpack.security.http.ssl.certificate: "/usr/share/elasticsearch/config/sa-monitoring-itg-h4.xxx.xxx.crt"
xpack.security.http.ssl.certificate_authorities: [ "/usr/share/elasticsearch/config/XXX_ENT_Private_SSL_CA_bundle.crt" ]

# Authentication - Authorization
xpack.security.authc:
  anonymous:
    username: anonymous_user
    roles: superuser
    authz_exception: true
  realms:
    native.native1:
      order: 0

# Watcher mail config
xpack.notification.email.account:
  exchange_account:
    profile: outlook
    email_defaults:
      from: 'elasticsearch-logs on hdp-itg-h4 <systemteams-tss-rnd@xxx.flowdock.com>'
    smtp:
      starttls.enable: true
      host: smtp1.xxx.com
      port: 25

#Enable OIDC token service
xpack.security.authc.token.enabled: true

##Create an OpenID Connect realm
xpack.security.authc.realms.oidc.uidoidc:
  order: 1
  rp.client_id: "sa-kibana-itg"
  rp.response_type: code
  rp.redirect_uri: "https://sa-monitoring-itg-h4.xxx.xxx/kibana-api/security/v1/oidc"
  op.issuer: "https://login-itg.ext.xxx.com"
  op.authorization_endpoint: "https://login-itg.ext.xxx.com/as/authorization.oauth2"
  op.token_endpoint: "https://login-itg.ext.xxx.com/as/token.oauth2"
  op.jwkset_path: "https://login-itg.ext.xxx.com/pf/JWKS"
  op.userinfo_endpoint: "https://login-itg.ext.xxx.com/idp/userinfo.openid"
  op.endsession_endpoint: "https://login-itg.ext.xxx.com/idp/startSLO.ping"
  rp.post_logout_redirect_uri: "https://sa-monitoring-itg-h4.xxx.xxx/logged_out"
  rp.signature_algorithm: HS256
  claims.principal: uid
  #claims.groups: "http://example.info/claims/groups"

I'm using the following ansible play to restart elasticsearch:

---

- name: Stop elasticsearch docker container
  docker_container:
    name: elasticsearch
    image: "{{ it_dtr_host }}/sa20sre/elasticsearch"
    state: stopped
  ignore_errors: yes

- name: Add elasticsearch data dir
  file:
    path: "{{base_path}}/elasticsearch/data"
    state: directory
    mode: 0777

- name: Set Facts
  include: set_facts.yml

- name: Start elasticsearch docker container
  docker_container:
    name: elasticsearch
    image: "{{ it_dtr_host }}/sa20sre/elasticsearch:{{ elasticsearch_version }}"
    state: started
  ignore_errors: yes

After the last task (Start elasticsearch docker container), elasticsearch docker containers are started on all nodes but after aproximatelly 30 seconds all of them stop with the following message in the logs:

{"type": "server", "timestamp": "2021-04-08T13:39:58,777Z", "level": "INFO", "component": "o.e.t.TransportService", "cluster.name": "docker-cluster", "node.name": "0d7781abc420", "message": "publish_address {172.17.0.4:9300}, bound_addresses {0.0.0.0:9300}" }
{"type": "server", "timestamp": "2021-04-08T13:39:58,790Z", "level": "INFO", "component": "o.e.b.BootstrapChecks", "cluster.name": "docker-cluster", "node.name": "0d7781abc420", "message": "bound or publishing to a non-loopback address, enforcing bootstrap checks" }
ERROR: [1] bootstrap checks failed
[1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
{"type": "server", "timestamp": "2021-04-08T13:39:58,887Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "0d7781abc420", "message": "stopping ..." }
{"type": "server", "timestamp": "2021-04-08T13:39:59,076Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "0d7781abc420", "message": "stopped" }
{"type": "server", "timestamp": "2021-04-08T13:39:59,077Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "0d7781abc420", "message": "closing ..." }
{"type": "server", "timestamp": "2021-04-08T13:39:59,217Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "0d7781abc420", "message": "closed" }

The log file says that the default discovery settings are unsuitable for production use and that at least one of discovery.seed_hosts, discovery.seed_providers and cluster.initial_master_nodes must be configured. As you can see from the configuration I have two of them configured (discovery.seed_hosts and cluster.initial_master_nodes).

Starting the elasticsearch manually doesn't help, elasticsearch starts and stops after a short period of time with the same error.

I've turned the net upside down trying to find something usefull but nothing I tried helped.

I've even tried to resolve this by setting es.enforce.bootstrap.checks to false in jvm.options file (although this is not a single node cluster) but it didn't help:

# avoid bootstrap checks
-Des.enforce.bootstrap.checks=false

I'll appreciate any help with this.

It seems the nodes are not reading the elasticsearch.yml file you think they are. Not only does it say that discovery.seed_hosts is not set, but also:

This isn't the cluster name or the node name you have configured.

@DavidTurner Thank you for promptly answering my question.

I have changed the play for starting the elasticsearch docker container. It seems that now the node.name is reported correctly:

{"type": "server", "timestamp": "2021-04-20T09:50:15,940Z", "level": "INFO", "component": "o.e.b.BootstrapChecks", "cluster.name": "docker-cluster", "node.name": "c4t19156.xxx.xxx", "message": "bound or publishing to a non-loopback address, enforcing bootstrap checks" }
ERROR: [1] bootstrap checks failed
[1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
{"type": "server", "timestamp": "2021-04-20T09:50:15,973Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "c4t19156.xxx.xxx", "message": "stopping ..." }
{"type": "server", "timestamp": "2021-04-20T09:50:16,028Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "c4t19156.xxx.xxx", "message": "stopped" }
{"type": "server", "timestamp": "2021-04-20T09:50:16,028Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "c4t19156.xxx.xxx", "message": "closing ..." }
{"type": "server", "timestamp": "2021-04-20T09:50:16,063Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "c4t19156.xxx.xxx", "message": "closed" }
{"type": "server", "timestamp": "2021-04-20T09:50:16,070Z", "level": "INFO", "component": "o.e.x.m.p.NativeController", "cluster.name": "docker-cluster", "node.name": "c4t19156.xxx.xxx", "message": "Native controller process has stopped - no new native processes can be started" }

But the problem still exists. Checking if I need also to set docker-cluster.

TBC the problem wasn't that the node name was wrong, it was that Elasticsearch isn't using the elasticsearch.yml file you shared. It still isn't, because the cluster.name is wrong and it says that discovery.seed_hosts is not set.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.