Network.host issue: elasticsearch.service: Main process exited, code=exited, status=78

My config file:

Use a descriptive name for your cluster:

cluster.name: my-search

------------------------------------ Node ------------------------------------

Use a descriptive name for the node:

node.name: ${HOSTNAME}
node.ingest: true
node.data: false
node.master: false

Add custom attributes to the node:

#node.attr.rack: r1

----------------------------------- Paths ------------------------------------

Path to directory where to store the data (separate multiple locations by comma):

path.data: /var/lib/elasticsearch

Path to log files:

path.logs: /var/log/elasticsearch

----------------------------------- Memory -----------------------------------

Lock the memory on startup:

#bootstrap.memory_lock: true

Make sure that the heap size is set to about half the memory available

on the system and that the owner of the process is allowed to use this

limit.

Elasticsearch performs poorly when the system is swapping the memory.

---------------------------------- Network -----------------------------------

Set the bind address to a specific IP (IPv4 or IPv6):

network.host: 0.0.0.0

Set a custom port for HTTP:

this is the configuration file from /etc/elasticsearch, not the log file

/var/log/elasticsearch/gc.log

I can see this in the log:
Entering safepoint region: GenCollectForAllocation
GC(2) Pause Young (Allocation Failure)
GC(2) Using 8 workers of 8 for evacuation
GC(2) Desired survivor size 17891328 bytes, new threshold 6 (max threshold 6)
GC(2) Age table with threshold 6 (max threshold 6)
GC(2) - age 1: 2505304 bytes, 2505304 total
GC(2) - age 2: 4066352 bytes, 6571656 total
GC(2) - age 3: 10031824 bytes, 16603480 total
GC(2) ParNew: 298211K->23277K(314560K)
GC(2) CMS: 0K->0K(699072K)
GC(2) Metaspace: 20810K->20810K(1069056K)
GC(2) Pause Young (Allocation Failure) 291M->22M(989M) 5.505ms
GC(2) User=0.04s Sys=0.00s Real=0.01s
Leaving safepoint region
Total time for which application threads were stopped: 0.0057126 seconds, Stopping threads took: 0.0000459 seconds

as written above, there should be a file, that is named like your cluster name, thus my-search.log in the log directory, please show the output of that one.

You are right. I am not so familiar with elasticsearch logging.

Here is the my-search.log

Thanks again

Perhaps take a look at this post.... May explain why elasticsearch fails when you try to start it when you change network.host

See this snippet from the logfile

[2019-09-24T13:35:48,085][INFO ][o.e.b.BootstrapChecks    ] [eslnx02] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-09-24T13:35:48,089][ERROR][o.e.b.Bootstrap          ] [eslnx02] node validation exception
[3] bootstrap checks failed
[1]: initial heap size [2147483648] not equal to maximum heap size [4294967296]; this can cause resize pauses and prevents mlockall from locking the entire heap
[2]: memory locking requested for elasticsearch process but memory is not locked
[3]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

See https://www.elastic.co/guide/en/elasticsearch/reference/7.3/bootstrap-checks.html

I have fount the root cause of the issue:

  • If you configure the network.host value you also have to configure the transport.host.

This means you have to add the following section to elasticsearch.yml and your system will work.

network.host: 0.0.0.0
http.port: 9200

transport.host: _site_
transport.tcp.port: 9300

Then you should restart the service then it works. :slight_smile:

Additionally if you would like to enforce elasticsearch to listens on IPv4 sou have to add the next line to /etc/elasticsearch/jvm.options:

-Djava.net.preferIPv4Stack=true

So my environment works fine now. :slight_smile:

My comment: The related document does not mention this. :disappointed_relieved::cry:

Thank you for your support and your suggestions.

Have a great day.

I don't think that's true, and I have tried and confirmed that there is no problem running Elasticsearch with network.host set and transport.host unset.

The issue you had was that there were bootstrap checks failing.

Thank you for your feedback. Nevertheless the bootstrap chack was ok and my solution was the key.

This solved the same issue on my Windows host as well.

Can you help us understand how to reproduce the problem you solved by setting transport.host? Normally this setting should not be set.

Sure. :slight_smile:
If you merely configure this:

network.host: 0.0.0.0
http.port: 9200

You are facing the issue you can see above. Then if you expand your config file with the transport.host related configuration, the system works fine.

transport.host: _site_
transport.tcp.port: 9300

All together is the solution:

network.host: 0.0.0.0
http.port: 9200

transport.host: _site_
transport.tcp.port: 9300

My working elasticsearch.yml configuration now:

cluster.name: my-search
node.name: ${HOSTNAME}
node.data: false
node.master: true
node.ingest: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
transport.host: _site_
transport.tcp.port: 9300

Response of get request on port 9200:

{
    "name": "eslnx02",
    "cluster_name": "my-search",
    "cluster_uuid": "tpIiL4e5QTicAdzspRM-GQ",
    "version": {
        "number": "7.3.2",
        "build_flavor": "default",
        "build_type": "deb",
        "build_hash": "1c1faf1",
        "build_date": "2019-09-06T14:40:30.409026Z",
        "build_snapshot": false,
        "lucene_version": "8.1.0",
        "minimum_wire_compatibility_version": "6.8.0",
        "minimum_index_compatibility_version": "6.0.0-beta1"
    },
    "tagline": "You Know, for Search"
}

When your node starts up, it logs two lines containing the string bound_addresses that look like this:

[2019-09-26T12:41:10,124][INFO ][o.e.t.TransportService   ] [node-0] publish_address {192.168.1.139:9300}, bound_addresses {192.168.1.139:9300}, {192.168.1.179:9300}

and

[2019-09-26T12:41:10,787][INFO ][o.e.h.AbstractHttpServerTransport] [node-0] publish_address {192.168.1.139:9200}, bound_addresses {192.168.1.139:9200}, {192.168.1.179:9200}

These might be logged some time apart. Can you share these lines here?

Absolutely

[2019-09-26T10:09:39,513][INFO ][o.e.t.TransportService   ] [eslnx02] publish_address {10.10.10.21:9300}, bound_addresses {10.10.10.21:9300}
[2019-09-26T10:09:39,789][INFO ][o.e.h.AbstractHttpServerTransport] [eslnx02] publish_address {10.10.10.21:9200}, bound_addresses {0.0.0.0:9200}

Thanks, I am wondering if there's a bug here. Do you see a line saying bound or publishing to a non-loopback address shortly after the first of these? I.e.:

[2019-09-26T13:04:25,325][INFO ][o.e.t.TransportService   ] [node-0] publish_address {192.168.1.139:9302}, bound_addresses {192.168.1.139:9302}, {192.168.1.179:9302}
[2019-09-26T13:04:25,337][INFO ][o.e.b.BootstrapChecks    ] [node-0] bound or publishing to a non-loopback address, enforcing bootstrap checks

Would it be possible to share the whole log output from starting up the node through to seeing the first master node changed message?

Also, could you comment out just the transport.* lines in your config, restart the node and share the whole log output from starting up the node through to the whole message saying bootstrap checks failed?

:slight_smile:

bootstrap check related row is there imediately afterpublish_address {10.10.10.21:9300}, bound_addresses {10.10.10.21:9300} :

[2019-09-26T13:10:33,104][INFO ][o.e.b.BootstrapChecks    ] [eslnx02] bound or publishing to a non-loopback address, enforcing bootstrap checks

my-search.log from starting... entry:

[2019-09-26T13:10:32,850][INFO ][o.e.n.Node               ] [eslnx02] starting ...
[2019-09-26T13:10:33,097][INFO ][o.e.t.TransportService   ] [eslnx02] publish_address {10.10.10.21:9300}, bound_addresses {10.10.10.21:9300}
[2019-09-26T13:10:33,104][INFO ][o.e.b.BootstrapChecks    ] [eslnx02] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-09-26T13:10:33,130][INFO ][o.e.c.c.Coordinator      ] [eslnx02] cluster UUID [tpIiL4e5QTicAdzspRM-GQ]
[2019-09-26T13:10:33,230][INFO ][o.e.c.s.MasterService    ] [eslnx02] elected-as-master ([1] nodes joined)[{eslnx02}{DVNKx4n-QVuQbZwTvWdXDg}{ukvBIwpRSx28Sa1Ex9kzhw}{10.10.10.21}{10.10.10.21:9300}{m}{ml.machine_memory=33714167808, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 8, version: 37, reason: master node changed {previous [], current [{eslnx02}{DVNKx4n-QVuQbZwTvWdXDg}{ukvBIwpRSx28Sa1Ex9kzhw}{10.10.10.21}{10.10.10.21:9300}{m}{ml.machine_memory=33714167808, xpack.installed=true, ml.max_open_jobs=20}]}
[2019-09-26T13:10:33,277][INFO ][o.e.c.s.ClusterApplierService] [eslnx02] master node changed {previous [], current [{eslnx02}{DVNKx4n-QVuQbZwTvWdXDg}{ukvBIwpRSx28Sa1Ex9kzhw}{10.10.10.21}{10.10.10.21:9300}{m}{ml.machine_memory=33714167808, xpack.installed=true, ml.max_open_jobs=20}]}, term: 8, version: 37, reason: Publication{term=8, version=37}
[2019-09-26T13:10:33,330][INFO ][o.e.h.AbstractHttpServerTransport] [eslnx02] publish_address {10.10.10.21:9200}, bound_addresses {0.0.0.0:9200}
[2019-09-26T13:10:33,330][INFO ][o.e.n.Node               ] [eslnx02] started
[2019-09-26T13:10:33,486][INFO ][o.e.l.LicenseService     ] [eslnx02] license [50abffd9-f659-4aa2-913b-dd81223921aa] mode [basic] - valid
[2019-09-26T13:10:33,487][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [eslnx02] Active license is now [BASIC]; Security is disabled
[2019-09-26T13:10:33,495][INFO ][o.e.g.GatewayService     ] [eslnx02] recovered [0] indices into cluster_state

Bootsrtap check failed:

[1] bootstrap checks failed
[1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

This means one of them is required from discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes

Could you please send me the documentation where I can find this? :wink:

That specific check is documented here.

However I really would like to see the complete logs from your node, both from when it successfully starts up and when it fails. The excerpts you've shared are unfortunately not enough for us to work out whether there's a bug that needs fixing here.

You can find both log here:

Nevertheless I am sure - to correct myself - the key here that fact if you would like to configure static (non-localhost) related settings at network.host you have to add one of them from discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes.

I look forward to hearing from you.

Yes, this is by design, and documented. The fix is simply to set one of these settings (e.g. discovery.seed_hosts: []). That's quite different from setting transport.host: _site_, which I think has no effect on whether that bootstrap check passes or not.

1 Like