Network.host issue: elasticsearch.service: Main process exited, code=exited, status=78

the1bit · September 19, 2019, 11:25am

Hi,

I try to configure an Elasticsearch cluster on Ubuntu 18.04 (on Azure). The basic installation is fine. Nevertheless when I try to configure the "network.host" value in the configuration file the Elasticsearch service does not start.

Error message:

systemd[1]: Started Elasticsearch.
-- Subject: Unit elasticsearch.service has finished start-up
-- Defined-By: systemd
-- Support: Enterprise open source support | Ubuntu

-- Unit elasticsearch.service has finished starting up.

-- The start-up result is RESULT.
elasticsearch[3955]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: elasticsearch.service: Failed with result 'exit-code'.

The following entries don't work:
network.host: 0.0.0.0
network.host: eth0
network.host: 10.79.10.17

Elasticsearch version: 7.3.1
JVM version: OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OS version: Ubuntu 18.04 LTS
vm.max_map_count=524288 (it does not work neither with default nor with 262144)

By default it refuse the connection for static ip
curl http://10.79.10.17:9200
curl: (7) Failed to connect to 10.79.10.17 port 9200: Connection refused

When I try to configure network.host value the services does not start.

Thank you for your help in this matter

spinscale · September 19, 2019, 1:06pm

can you check the logfile in /var/log/elasticsearch that is named like your cluster name is configured? I assume, that a bootstrap check has failed and that the log file contains information how to fix this, but this is just an assumption for now.

See https://www.elastic.co/guide/en/elasticsearch/reference/7.3/starting-elasticsearch.html#start-es-deb-systemd

the1bit · September 19, 2019, 2:28pm

My config file:

Use a descriptive name for your cluster:

cluster.name: my-search

------------------------------------ Node ------------------------------------

Use a descriptive name for the node:

node.name: ${HOSTNAME}
node.ingest: true
node.data: false
node.master: false

Add custom attributes to the node:

#node.attr.rack: r1

----------------------------------- Paths ------------------------------------

Path to directory where to store the data (separate multiple locations by comma):

path.data: /var/lib/elasticsearch

Path to log files:

path.logs: /var/log/elasticsearch

----------------------------------- Memory -----------------------------------

Lock the memory on startup:

#bootstrap.memory_lock: true

Make sure that the heap size is set to about half the memory available

on the system and that the owner of the process is allowed to use this

limit.

Elasticsearch performs poorly when the system is swapping the memory.

---------------------------------- Network -----------------------------------

Set the bind address to a specific IP (IPv4 or IPv6):

network.host: 0.0.0.0

Set a custom port for HTTP:

spinscale · September 24, 2019, 8:23am

this is the configuration file from /etc/elasticsearch, not the log file

the1bit · September 24, 2019, 1:01pm

/var/log/elasticsearch/gc.log

I can see this in the log:
Entering safepoint region: GenCollectForAllocation
GC(2) Pause Young (Allocation Failure)
GC(2) Using 8 workers of 8 for evacuation
GC(2) Desired survivor size 17891328 bytes, new threshold 6 (max threshold 6)
GC(2) Age table with threshold 6 (max threshold 6)
GC(2) - age 1: 2505304 bytes, 2505304 total
GC(2) - age 2: 4066352 bytes, 6571656 total
GC(2) - age 3: 10031824 bytes, 16603480 total
GC(2) ParNew: 298211K->23277K(314560K)
GC(2) CMS: 0K->0K(699072K)
GC(2) Metaspace: 20810K->20810K(1069056K)
GC(2) Pause Young (Allocation Failure) 291M->22M(989M) 5.505ms
GC(2) User=0.04s Sys=0.00s Real=0.01s
Leaving safepoint region
Total time for which application threads were stopped: 0.0057126 seconds, Stopping threads took: 0.0000459 seconds

spinscale · September 24, 2019, 1:19pm

as written above, there should be a file, that is named like your cluster name, thus my-search.log in the log directory, please show the output of that one.

the1bit · September 24, 2019, 1:38pm

You are right. I am not so familiar with elasticsearch logging.

Here is the my-search.log

Thanks again

stephenb · September 24, 2019, 1:49pm

Perhaps take a look at this post.... May explain why elasticsearch fails when you try to start it when you change network.host

spinscale · September 25, 2019, 11:33am

See this snippet from the logfile

[2019-09-24T13:35:48,085][INFO ][o.e.b.BootstrapChecks    ] [eslnx02] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-09-24T13:35:48,089][ERROR][o.e.b.Bootstrap          ] [eslnx02] node validation exception
[3] bootstrap checks failed
[1]: initial heap size [2147483648] not equal to maximum heap size [4294967296]; this can cause resize pauses and prevents mlockall from locking the entire heap
[2]: memory locking requested for elasticsearch process but memory is not locked
[3]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

See https://www.elastic.co/guide/en/elasticsearch/reference/7.3/bootstrap-checks.html

the1bit · September 26, 2019, 10:27am

I have fount the root cause of the issue:

If you configure the network.host value you also have to configure the transport.host.

This means you have to add the following section to elasticsearch.yml and your system will work.

network.host: 0.0.0.0
http.port: 9200

transport.host: _site_
transport.tcp.port: 9300

Then you should restart the service then it works.

Additionally if you would like to enforce elasticsearch to listens on IPv4 sou have to add the next line to /etc/elasticsearch/jvm.options:

-Djava.net.preferIPv4Stack=true

So my environment works fine now.

My comment: The related document does not mention this.

Thank you for your support and your suggestions.

Have a great day.

DavidTurner · September 26, 2019, 11:43am

I don't think that's true, and I have tried and confirmed that there is no problem running Elasticsearch with network.host set and transport.host unset.

The issue you had was that there were bootstrap checks failing.

the1bit · September 26, 2019, 11:45am

Thank you for your feedback. Nevertheless the bootstrap chack was ok and my solution was the key.

This solved the same issue on my Windows host as well.

DavidTurner · September 26, 2019, 11:46am

Can you help us understand how to reproduce the problem you solved by setting transport.host? Normally this setting should not be set.

the1bit · September 26, 2019, 11:51am

Sure.
If you merely configure this:

network.host: 0.0.0.0
http.port: 9200

You are facing the issue you can see above. Then if you expand your config file with the transport.host related configuration, the system works fine.

transport.host: _site_
transport.tcp.port: 9300

All together is the solution:

network.host: 0.0.0.0
http.port: 9200

transport.host: _site_
transport.tcp.port: 9300

My working elasticsearch.yml configuration now:

cluster.name: my-search
node.name: ${HOSTNAME}
node.data: false
node.master: true
node.ingest: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
transport.host: _site_
transport.tcp.port: 9300

Response of get request on port 9200:

{
    "name": "eslnx02",
    "cluster_name": "my-search",
    "cluster_uuid": "tpIiL4e5QTicAdzspRM-GQ",
    "version": {
        "number": "7.3.2",
        "build_flavor": "default",
        "build_type": "deb",
        "build_hash": "1c1faf1",
        "build_date": "2019-09-06T14:40:30.409026Z",
        "build_snapshot": false,
        "lucene_version": "8.1.0",
        "minimum_wire_compatibility_version": "6.8.0",
        "minimum_index_compatibility_version": "6.0.0-beta1"
    },
    "tagline": "You Know, for Search"
}

DavidTurner · September 26, 2019, 11:57am

When your node starts up, it logs two lines containing the string bound_addresses that look like this:

[2019-09-26T12:41:10,124][INFO ][o.e.t.TransportService   ] [node-0] publish_address {192.168.1.139:9300}, bound_addresses {192.168.1.139:9300}, {192.168.1.179:9300}

and

[2019-09-26T12:41:10,787][INFO ][o.e.h.AbstractHttpServerTransport] [node-0] publish_address {192.168.1.139:9200}, bound_addresses {192.168.1.139:9200}, {192.168.1.179:9200}

These might be logged some time apart. Can you share these lines here?

the1bit · September 26, 2019, 12:01pm

Absolutely

[2019-09-26T10:09:39,513][INFO ][o.e.t.TransportService   ] [eslnx02] publish_address {10.10.10.21:9300}, bound_addresses {10.10.10.21:9300}
[2019-09-26T10:09:39,789][INFO ][o.e.h.AbstractHttpServerTransport] [eslnx02] publish_address {10.10.10.21:9200}, bound_addresses {0.0.0.0:9200}

DavidTurner · September 26, 2019, 12:12pm

Thanks, I am wondering if there's a bug here. Do you see a line saying bound or publishing to a non-loopback address shortly after the first of these? I.e.:

[2019-09-26T13:04:25,325][INFO ][o.e.t.TransportService   ] [node-0] publish_address {192.168.1.139:9302}, bound_addresses {192.168.1.139:9302}, {192.168.1.179:9302}
[2019-09-26T13:04:25,337][INFO ][o.e.b.BootstrapChecks    ] [node-0] bound or publishing to a non-loopback address, enforcing bootstrap checks

Would it be possible to share the whole log output from starting up the node through to seeing the first master node changed message?

Also, could you comment out just the transport.* lines in your config, restart the node and share the whole log output from starting up the node through to the whole message saying bootstrap checks failed?

the1bit · September 26, 2019, 1:22pm

bootstrap check related row is there imediately afterpublish_address {10.10.10.21:9300}, bound_addresses {10.10.10.21:9300} :

[2019-09-26T13:10:33,104][INFO ][o.e.b.BootstrapChecks    ] [eslnx02] bound or publishing to a non-loopback address, enforcing bootstrap checks

my-search.log from starting... entry:

[2019-09-26T13:10:32,850][INFO ][o.e.n.Node               ] [eslnx02] starting ...
[2019-09-26T13:10:33,097][INFO ][o.e.t.TransportService   ] [eslnx02] publish_address {10.10.10.21:9300}, bound_addresses {10.10.10.21:9300}
[2019-09-26T13:10:33,104][INFO ][o.e.b.BootstrapChecks    ] [eslnx02] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-09-26T13:10:33,130][INFO ][o.e.c.c.Coordinator      ] [eslnx02] cluster UUID [tpIiL4e5QTicAdzspRM-GQ]
[2019-09-26T13:10:33,230][INFO ][o.e.c.s.MasterService    ] [eslnx02] elected-as-master ([1] nodes joined)[{eslnx02}{DVNKx4n-QVuQbZwTvWdXDg}{ukvBIwpRSx28Sa1Ex9kzhw}{10.10.10.21}{10.10.10.21:9300}{m}{ml.machine_memory=33714167808, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 8, version: 37, reason: master node changed {previous [], current [{eslnx02}{DVNKx4n-QVuQbZwTvWdXDg}{ukvBIwpRSx28Sa1Ex9kzhw}{10.10.10.21}{10.10.10.21:9300}{m}{ml.machine_memory=33714167808, xpack.installed=true, ml.max_open_jobs=20}]}
[2019-09-26T13:10:33,277][INFO ][o.e.c.s.ClusterApplierService] [eslnx02] master node changed {previous [], current [{eslnx02}{DVNKx4n-QVuQbZwTvWdXDg}{ukvBIwpRSx28Sa1Ex9kzhw}{10.10.10.21}{10.10.10.21:9300}{m}{ml.machine_memory=33714167808, xpack.installed=true, ml.max_open_jobs=20}]}, term: 8, version: 37, reason: Publication{term=8, version=37}
[2019-09-26T13:10:33,330][INFO ][o.e.h.AbstractHttpServerTransport] [eslnx02] publish_address {10.10.10.21:9200}, bound_addresses {0.0.0.0:9200}
[2019-09-26T13:10:33,330][INFO ][o.e.n.Node               ] [eslnx02] started
[2019-09-26T13:10:33,486][INFO ][o.e.l.LicenseService     ] [eslnx02] license [50abffd9-f659-4aa2-913b-dd81223921aa] mode [basic] - valid
[2019-09-26T13:10:33,487][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [eslnx02] Active license is now [BASIC]; Security is disabled
[2019-09-26T13:10:33,495][INFO ][o.e.g.GatewayService     ] [eslnx02] recovered [0] indices into cluster_state

the1bit · September 26, 2019, 1:30pm

Bootsrtap check failed:

[1] bootstrap checks failed
[1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

This means one of them is required from discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes

Could you please send me the documentation where I can find this?

DavidTurner · September 26, 2019, 1:36pm

That specific check is documented here.

However I really would like to see the complete logs from your node, both from when it successfully starts up and when it fails. The excerpts you've shared are unfortunately not enough for us to work out whether there's a bug that needs fixing here.

Topic		Replies	Views
Es 1.0.1 network.host Elasticsearch	4	1327	July 6, 2017
Network.host vs http.host in the elasticsearch.yml Elasticsearch	7	7803	December 10, 2018
Elastic wouldn't start when changing network.host ip Elasticsearch	21	9707	November 11, 2019
Network.host or http.host - documentation problem? Elasticsearch	29	21050	November 4, 2022
Elasticsearch network host cant be set? Elasticsearch	10	6020	July 5, 2017