Unable to restart elasticsearch

sarabande · April 20, 2021, 7:25am

Hi,

I'm using elasticsearch in a multi-node cluster.

Here's the configuration file (elasticsearch.yml) from one of the two elasticsearch log master nodes (in the cluster I have also 6 elasticsearch log nodes):

# Network
network:
  host: 0.0.0.0
  publish_host: c4t19154.xxx.xxx
http.port: 9200

# Cluster / node name
cluster.name: "es-sa20-xxx-log-hdp-itg-h4"
node.name: "c4t19154.xxx.xxx"

# Node role
node.master: True
node.data: False

# Discovery
discovery.seed_hosts: ['c4t19156.xxx.xxx:9300', 'c4t19154.xxx.xxx:9300', 'c4t19145.xxx.xxx:9300', 'c4t19147.xxx.xxx:9300', 'c4t19867.xxx.xxx:9300', 'c4t19149.xxx.xxx:9300', 'c4t19150.xxx.xxx:9300', 'hc9t09679.xxx.xxx:9300']
cluster.initial_master_nodes: ['c4t19156.xxx.xxx', 'c4t19154.xxx.xxx']

# Memory - Performance
bootstrap.memory_lock: true

# Monitoring
xpack.monitoring.collection.enabled: true
xpack.monitoring.exporters:
  csa_monitoring:
    type: http
    host: ['https://c9t24539.xxx.xxx:9200', 'https://c9t24540.xxx.xxx:9200', 'https://c9t24541.xxx.xxx:9200']
    auth:
      username: "elastic"
      password: "changeme"
    ssl:
      certificate_authorities: [ "/usr/share/elasticsearch/config/XXX_ENT_Private_SSL_CA_bundle.crt" ]

# Security
xpack.security.enabled: true
xpack.security.audit.enabled: false

# TLS/SSL
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: "/usr/share/elasticsearch/config/sa-monitoring-itg-h4.xxx.xxx.key"
xpack.security.transport.ssl.certificate: "/usr/share/elasticsearch/config/sa-monitoring-itg-h4.xxx.xxx.crt"
xpack.security.transport.ssl.certificate_authorities: [ "/usr/share/elasticsearch/config/XXX_ENT_Private_SSL_CA_bundle.crt" ]

xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.key: "/usr/share/elasticsearch/config/sa-monitoring-itg-h4.xxx.xxx.key"
xpack.security.http.ssl.certificate: "/usr/share/elasticsearch/config/sa-monitoring-itg-h4.xxx.xxx.crt"
xpack.security.http.ssl.certificate_authorities: [ "/usr/share/elasticsearch/config/XXX_ENT_Private_SSL_CA_bundle.crt" ]

# Authentication - Authorization
xpack.security.authc:
  anonymous:
    username: anonymous_user
    roles: superuser
    authz_exception: true
  realms:
    native.native1:
      order: 0

# Watcher mail config
xpack.notification.email.account:
  exchange_account:
    profile: outlook
    email_defaults:
      from: 'elasticsearch-logs on hdp-itg-h4 <systemteams-tss-rnd@xxx.flowdock.com>'
    smtp:
      starttls.enable: true
      host: smtp1.xxx.com
      port: 25

#Enable OIDC token service
xpack.security.authc.token.enabled: true

##Create an OpenID Connect realm
xpack.security.authc.realms.oidc.uidoidc:
  order: 1
  rp.client_id: "sa-kibana-itg"
  rp.response_type: code
  rp.redirect_uri: "https://sa-monitoring-itg-h4.xxx.xxx/kibana-api/security/v1/oidc"
  op.issuer: "https://login-itg.ext.xxx.com"
  op.authorization_endpoint: "https://login-itg.ext.xxx.com/as/authorization.oauth2"
  op.token_endpoint: "https://login-itg.ext.xxx.com/as/token.oauth2"
  op.jwkset_path: "https://login-itg.ext.xxx.com/pf/JWKS"
  op.userinfo_endpoint: "https://login-itg.ext.xxx.com/idp/userinfo.openid"
  op.endsession_endpoint: "https://login-itg.ext.xxx.com/idp/startSLO.ping"
  rp.post_logout_redirect_uri: "https://sa-monitoring-itg-h4.xxx.xxx/logged_out"
  rp.signature_algorithm: HS256
  claims.principal: uid
  #claims.groups: "http://example.info/claims/groups"

I'm using the following ansible play to restart elasticsearch:

---

- name: Stop elasticsearch docker container
  docker_container:
    name: elasticsearch
    image: "{{ it_dtr_host }}/sa20sre/elasticsearch"
    state: stopped
  ignore_errors: yes

- name: Add elasticsearch data dir
  file:
    path: "{{base_path}}/elasticsearch/data"
    state: directory
    mode: 0777

- name: Set Facts
  include: set_facts.yml

- name: Start elasticsearch docker container
  docker_container:
    name: elasticsearch
    image: "{{ it_dtr_host }}/sa20sre/elasticsearch:{{ elasticsearch_version }}"
    state: started
  ignore_errors: yes

After the last task (Start elasticsearch docker container), elasticsearch docker containers are started on all nodes but after aproximatelly 30 seconds all of them stop with the following message in the logs:

{"type": "server", "timestamp": "2021-04-08T13:39:58,777Z", "level": "INFO", "component": "o.e.t.TransportService", "cluster.name": "docker-cluster", "node.name": "0d7781abc420", "message": "publish_address {172.17.0.4:9300}, bound_addresses {0.0.0.0:9300}" }
{"type": "server", "timestamp": "2021-04-08T13:39:58,790Z", "level": "INFO", "component": "o.e.b.BootstrapChecks", "cluster.name": "docker-cluster", "node.name": "0d7781abc420", "message": "bound or publishing to a non-loopback address, enforcing bootstrap checks" }
ERROR: [1] bootstrap checks failed
[1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
{"type": "server", "timestamp": "2021-04-08T13:39:58,887Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "0d7781abc420", "message": "stopping ..." }
{"type": "server", "timestamp": "2021-04-08T13:39:59,076Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "0d7781abc420", "message": "stopped" }
{"type": "server", "timestamp": "2021-04-08T13:39:59,077Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "0d7781abc420", "message": "closing ..." }
{"type": "server", "timestamp": "2021-04-08T13:39:59,217Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "0d7781abc420", "message": "closed" }

The log file says that the default discovery settings are unsuitable for production use and that at least one of discovery.seed_hosts, discovery.seed_providers and cluster.initial_master_nodes must be configured. As you can see from the configuration I have two of them configured (discovery.seed_hosts and cluster.initial_master_nodes).

Starting the elasticsearch manually doesn't help, elasticsearch starts and stops after a short period of time with the same error.

I've turned the net upside down trying to find something usefull but nothing I tried helped.

I've even tried to resolve this by setting es.enforce.bootstrap.checks to false in jvm.options file (although this is not a single node cluster) but it didn't help:

# avoid bootstrap checks
-Des.enforce.bootstrap.checks=false

I'll appreciate any help with this.

DavidTurner · April 20, 2021, 8:43am

It seems the nodes are not reading the elasticsearch.yml file you think they are. Not only does it say that discovery.seed_hosts is not set, but also:

This isn't the cluster name or the node name you have configured.

sarabande · April 20, 2021, 9:56am

@DavidTurner Thank you for promptly answering my question.

I have changed the play for starting the elasticsearch docker container. It seems that now the node.name is reported correctly:

{"type": "server", "timestamp": "2021-04-20T09:50:15,940Z", "level": "INFO", "component": "o.e.b.BootstrapChecks", "cluster.name": "docker-cluster", "node.name": "c4t19156.xxx.xxx", "message": "bound or publishing to a non-loopback address, enforcing bootstrap checks" }
ERROR: [1] bootstrap checks failed
[1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
{"type": "server", "timestamp": "2021-04-20T09:50:15,973Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "c4t19156.xxx.xxx", "message": "stopping ..." }
{"type": "server", "timestamp": "2021-04-20T09:50:16,028Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "c4t19156.xxx.xxx", "message": "stopped" }
{"type": "server", "timestamp": "2021-04-20T09:50:16,028Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "c4t19156.xxx.xxx", "message": "closing ..." }
{"type": "server", "timestamp": "2021-04-20T09:50:16,063Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "docker-cluster", "node.name": "c4t19156.xxx.xxx", "message": "closed" }
{"type": "server", "timestamp": "2021-04-20T09:50:16,070Z", "level": "INFO", "component": "o.e.x.m.p.NativeController", "cluster.name": "docker-cluster", "node.name": "c4t19156.xxx.xxx", "message": "Native controller process has stopped - no new native processes can be started" }

But the problem still exists. Checking if I need also to set docker-cluster.

DavidTurner · April 20, 2021, 10:09am

TBC the problem wasn't that the node name was wrong, it was that Elasticsearch isn't using the elasticsearch.yml file you shared. It still isn't, because the cluster.name is wrong and it says that discovery.seed_hosts is not set.

system · May 18, 2021, 10:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic Search restart Elasticsearch	10	922	March 28, 2017
Elasticsearch can't restart and join the cluster Elasticsearch elastic-stack-security	4	739	December 7, 2021
Trouble setting up an elasticsearch cluster Elasticsearch	7	601	November 11, 2019
Restart Elasticsearch cluster Elasticsearch	9	411	December 7, 2020
Wrong config? Elasticsearch	8	1221	June 7, 2019

Unable to restart elasticsearch

Related topics