Ansible Playbook Hangs on " Wait for elasticsearch to startup "

Hi, I'm trying to deploy a 3 node elasticsearch cluster to aws EC2 using the role here:

Ansible runs the playbook with no error, until it hangs for minutes on this step:

"TASK [elasticsearch : Wait for elasticsearch to startup delay=5, host={{es_api_host}}, port={{es_api_port}}, connect_timeout=1] ***"

I can ssh in, but find that elasticsearch is not running.

Here is the tail end of that failure:


TASK [elasticsearch : command _raw_params=/bin/true] ***************************
skipping: [10.16.8.111]
TASK [elasticsearch : Copy role_mapping.yml File for Instance src=security/role_mapping.yml.j2, force=yes, mode=0644, dest={{conf_dir}}/x-pack/role_mapping.yml, group={{ es_group }}, owner={{ es_user }}] ***
skipping: [10.16.8.111]
TASK [elasticsearch : Copy message auth key to elasticsearch src={{ es_message_auth_file }}, force=yes, mode=0600, dest={{conf_dir}}/x-pack/system_key, group={{ es_group }}, owner={{ es_user }}] ***
skipping: [10.16.8.111]
TASK [elasticsearch : Ensure security conf directory exists owner={{ es_user }}, path={{ conf_dir }}/security, state=directory, group={{ es_group }}] ***
skipping: [10.16.8.111]
TASK [elasticsearch : Set Plugin Directory Permissions owner={{ es_user }}, path={{ es_home }}/plugins, state=directory, group={{ es_group }}, recurse=yes] ***
ok: [10.16.8.111]
TASK [elasticsearch : file owner={{ es_user }}, path=/etc/elasticsearch/templates, state=directory, group={{ es_group }}] ***
skipping: [10.16.8.111]
TASK [elasticsearch : Copy default templates to elasticsearch dest=/etc/elasticsearch/, src=templates, group={{ es_group }}, owner={{ es_user }}] ***
skipping: [10.16.8.111]
TASK [elasticsearch : Copy templates to elasticsearch dest=/etc/elasticsearch/templates, src={{ item }}, group={{ es_group }}, owner={{ es_user }}] ***
TASK [elasticsearch : Wait for elasticsearch to startup delay=5, host={{es_api_host}}, port={{es_api_port}}, connect_timeout=1] ***

NOTES:

  • My playbook is pretty much identical to the example provided there as well, and I've made 0 modifications to the role.
  • I can ssh into the instances after the failure, so connectivity is not an issue. Elasticsearch is not running.

The playbook is as follows:


  • hosts: es_master_nodes
    roles:

    • { role: elasticsearch, es_instance_name: "es-node1", es_data_dirs: "/opt/elasticsearch/data", es_log_dir: "/opt/elasticsearch/logs",
      es_config: {
      node.name: "es-node1",
      cluster.name: "es-clstr",
      discovery.zen.ping.unicast.hosts: "localhost:9301",
      http.port: 9201,
      transport.tcp.port: 9301,
      node.data: false,
      node.master: true,
      bootstrap.memory_lock: true,
      } }
      vars:
      es_scripts: false
      es_templates: false
      es_version_lock: false
      es_heap_size: 1g
      es_api_port: 9201
  • hosts: es_data_nodes
    roles:

    • { role: elasticsearch, es_instance_name: "es-node2", es_data_dirs: "/opt/elasticsearch/data", es_log_dir: "/opt/elasticsearch/logs",
      es_config: {
      node.name: "es-node2",
      cluster.name: "es-clstr",
      discovery.zen.ping.unicast.hosts: "localhost:9301",
      http.port: 9201,
      transport.tcp.port: 9301,
      node.data: true,
      node.master: false,
      bootstrap.memory_lock: false,
      } }
      vars:
      es_scripts: false
      es_templates: false
      es_version_lock: false
      ansible_user: centos
      es_api_port: 9201
  • hosts: es_data_nodes
    roles:

    • { role: elasticsearch, es_instance_name: "es-node3", es_data_dirs: "/opt/elasticsearch/data", es_log_dir: "/opt/elasticsearch/logs",
      es_config: {
      node.name: "es-node3",
      cluster.name: "es-clstr",
      discovery.zen.ping.unicast.hosts: "localhost:9301",
      http.port: 9201,
      transport.tcp.port: 9301,
      node.data: true,
      node.master: false,
      bootstrap.memory_lock: false,
      } }
      vars:
      es_scripts: false
      es_templates: false
      es_version_lock: false
      ansible_user: centos
      es_api_port: 9201

Running ansible with -vvv for verbose output:


ASK [elasticsearch : Copy templates to elasticsearch dest=/etc/elasticsearch/templates, src={{ item }}, group={{ es_group }}, owner={{ es_user }}] ***
task path: /Users/welcome/GitLab/skunkworks/elasticsearch/playbooks/roles/elasticsearch/tasks/elasticsearch-templates.yml:10

TASK [elasticsearch : Wait for elasticsearch to startup delay=5, host={{es_api_host}}, port={{es_api_port}}, connect_timeout=1] ***
task path: /Users/welcome/GitLab/skunkworks/elasticsearch/playbooks/roles/elasticsearch/tasks/main.yml:47
Using module file /usr/local/lib/python2.7/site-packages/ansible/modules/core/utilities/logic/wait_for.py
<10.16.8.111> ESTABLISH SSH CONNECTION FOR USER: centos
<10.16.8.111> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o Port=22 -o 'IdentityFile="/Users/welcome/.ssh/eu.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=centos -o ConnectTimeout=10 -o ControlPath=/Users/welcome/.ansible/cp/ansible-ssh-%h-%p-%r 10.16.8.111 '/bin/sh -c '"'"'( umask 77 && mkdir -p "echo ~/.ansible/tmp/ansible-tmp-1489554497.92-236839198736040" && echo ansible-tmp-1489554497.92-236839198736040="echo ~/.ansible/tmp/ansible-tmp-1489554497.92-236839198736040" ) && sleep 0'"'"''
<10.16.8.111> PUT /var/folders/ys/ykfsqtz54xn6k9vgfwdhckkw0000gp/T/tmpkJbVnZ TO /home/centos/.ansible/tmp/ansible-tmp-1489554497.92-236839198736040/wait_for.py
<10.16.8.111> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o Port=22 -o 'IdentityFile="/Users/welcome/.ssh/eu.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=centos -o ConnectTimeout=10 -o ControlPath=/Users/welcome/.ansible/cp/ansible-ssh-%h-%p-%r '[10.16.8.111]'
<10.16.8.111> ESTABLISH SSH CONNECTION FOR USER: centos<10.16.8.111> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o Port=22 -o 'IdentityFile="/Users/welcome/.ssh/eu.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=centos -o ConnectTimeout=10 -o ControlPath=/Users/welcome/.ansible/cp/ansible-ssh-%h-%p-%r 10.16.8.111 '/bin/sh -c '"'"'chmod u+x /home/centos/.ansible/tmp/ansible-tmp-1489554497.92-236839198736040/ /home/centos/.ansible/tmp/ansible-tmp-1489554497.92-236839198736040/wait_for.py && sleep 0'"'"''
<10.16.8.111> ESTABLISH SSH CONNECTION FOR USER: centos
<10.16.8.111> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o Port=22 -o 'IdentityFile="/Users/welcome/.ssh/eu.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=centos -o ConnectTimeout=10 -o ControlPath=/Users/welcome/.ansible/cp/ansible-ssh-%h-%p-%r -tt 10.16.8.111 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-hsmfdhfanfkjqnbmoxjihrtuhbbvegsg; /usr/bin/python /home/centos/.ansible/tmp/ansible-tmp-1489554497.92-236839198736040/wait_for.py; rm -rf "/home/centos/.ansible/tmp/ansible-tmp-1489554497.92-236839198736040/" > /dev/null 2>&1'"'"'"'"'"'"'"'"' && sleep 0'"'"''


Found the issue here. The ansible role does not seem to enable the elasticsearch startp service in systemd. I added an ansible bloc to do this in the role definition.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.