None of the configured nodes are available - but only sometimes

Hello, in our Elasticsearch 2.3.4 implementation we've been experiencing the dreaded "No nodes available, transport -1" issue. After looking at varius topics regarding the issue, the issues I've checked is NOT the problem is:

  1. Cluster name. We're using default "elasticsearch" name and that is what I see in the java code and when I go directly to ip:elasticport when I check my browser.

Also, our setup is that it pulls data from a remote server every 2 minutes, then it sends it to elasticsearch. We're only getting the error (which we read from catalina.out) about once every second log, however sometimes it can be three times in one log. So it's not something that happens all the time.

This lead me to looking at the question of checking whether or not the port may be closing down every once in a while. I am by no means any expert on networking, but google gave me a few ways of checking whether it's "open", and it's at least listening.

We're running the logger in one docker and elasticsearch in another, both on the same server if that helps anyone trying to figure this out. I should also mention that the current logger was working "as it should" prior to a week ago, however we're in the process of upgrading to ES 5.3 and happened to crash the old one in that process and had to grab the old war file from git and start the docker again.

Thanks for your time.

Which war file is that?

The compiled .war file that runs with tomcat inside it's own docker container. Thats the application responsible for pulling data from a remote server and connecting to elasticsearch, sending the converted objects into it. Essentially backup of the previous working logging system. However a big issue in this project is that the people that were working on it no longer work here, so there might always be some tweaks needed to get things to work that might not have been documented.

Some more information I found after checking the logs more thoroughly. Yesterday when we backed it up and had no errors, it was fine but during the night it suddenly started with the "No nodes available" error, around 12 hours after the backup was started

Okay.The server we pulled data from went down for a while, and after it came up we've had zero errors again. This leads me to believe it's not related to ports, or wrongly set configuration but it might be related to resources. We only have 1 node, which is something I forgot to write in my OG post.

I checked that we do close clients in the logger.

Update: So, I figured out a little bit more. It's actually not working at all, it's just that since it's summer the messages are more infrequent so it's only crashing when it actually finds a message it wants to send, which can sometimes be quite a while.

Meaning it's 100% down and never sends anything.

What do the Elasticsearch logs show?

Unfortunately I cannot copy the logs due to how things are set up here, I will rewrite what they say as best I can.

In /data/logs I get the low disk watermark, as well as the occasional high one. But it says we still have 10gb free.

In the /home/elasticsearch/logs I have elasticsearch.log where every message has Caprice tag. It confirms version (2.3.4), tells me heap size is 990 mb, my publish address (local with standard 9300 port) publish with 9200 port, then it starts, recovers 0 indices into cluster states, stops, and closes.

I did get confused abit about the standard ports, as those are not the ones we use and I thought it important to mention that we DO insert data into elasticsearch right now directly from logstash (different source with less parsing needed) and that one works, using our actual port as shown in docker ps.

Thank you again for your time spent helping me out.

As in, it does that in a loop?
If it does then that's a problem, you will need to find out what's stopping it.

No. It only does one "loop", I assume thats how it initalizes? That log isnt more than those 15~ lines.

Some more information that might help, as I found more than just low disk allocation in the elasticsearch logs.

Whenever we restart elasticsearch, we get these lines

start

version
init
modules (reindex, lang groovy etc)
using 1 data path, two paths shown
heap size
initialized
publish address ( same port as the transportclient in the java code sends to, seems like the log I mentioned last time is super old, this one is fresh)
a discovery line
cluster service line mentioning new master (no joins received)
http publish address
started
recovered 18 indices
Cluster health from red to yellow
stopping stopped
closing closed

end

Also looking back to around the time we last managed to insert data through java there's a bunch of errors, related to the CircuitBreakingException which lead me to bugs with a newer version than we're currently utilizing. This bug is NOT duplicated when we now try to send information from the java API to the elasticsearch - we still get the None of the Configured nodes are available error. The error specifically happens during the indexrequest.actionGet line in the java code, which looks like this

public void sendJson(JSONObject obj) throws ExecutionException, InterruptedException, IOException {
	String uuid = java.util.UUID.randomUUID().toString();
	IndexRequest indexRequest = new IndexRequest(this.index, this.type, uuid);
	indexRequest.source(obj.toString());
	client.index(indexRequest).actionGet();

}

Hopefully this will help, as nothing we've tried have worked yet.

Tentative fix: The solution addressed here

where adding transport: 0.0.0.1 to the elasticsearch.yml file. Ours did not have such a line to uncomment and already had lines related to transport, but adding it still worked.

I'll monitor the system and close it if it doesn't act up again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.