Http.publish_host not publishing the host I want

Hi y'all,

I'm developing a web app with Elasticsearch (1.7.1) on the back-end protected by Apache (2.4) running as a reverse proxy and the Elasticsearch angularJS module (6.1.0) providing connectivity for my angularJS (1.4.4) app. My server runs Windows 2012 R2 Datacenter and is subject to various policies and firewalls over which I have limited control. My users are sometimes compelled to run older versions of IE or Firefox which do not seem to like CORS even a little bit.

When my app opens a connection to Elasticsearch, the first thing it does is sniff for other nodes in the cluster. By default each node returns its IP address, which is bad, because it avoids my Apache reverse proxy. In order to keep using my reverse proxy, I tried setting:

http.publish_host: "my.server.domain.name/proxy_location" 
http.publish_port: 80

in the elasticsearch.yml file. This (and many variations involving protocols and slashes) causes Elasticsearch to fail on startup. The only thing that allows Elasticsearch to start is using:

http.publish_host: "my.server.domain.name" 
http.publish_port: 80 

which isn't great because it doesn't include my proxy location. But, as it turns out, this doesn't even work a little bit, because when my client sniffs now it gets my server name and IP address separated by a slash (e.g. "my.server.domain.name/12.3.45.67") which seems all kinds of crazy to me.

Is this intended behavior, or is this a bug? My work around right now is to add a line that fixes the host name and tacks on my proxy location inside the sniff function of the Elasticsearch angularJS library, which is less than ideal both because the problem isn't really in the angularJS library, and also for all the normal reasons that changing 3rd party library code is less than ideal.

Please help.

Jeremy

Is there some reason why you can't just specify the proxy in your angularJS app. And then on your proxy specify all of your nodes to round robin.

Also, the publish host only works on FQDNs or IPs, not URLs.

I guess that makes sense, although it's inconvenient. I understand not wanting to parse the publish host into host and path sections so that you know where to insert the publish_port.

Hey, thanks for your reply. That's definitely a possible work around, and it's something I might end up having to do. I am a little leery of using Apache as a load balancer because doesn't seem to be super smart about detecting and responding to node problems (stackoverflow). Also, I feel like I shouldn't have to! The functionality exists in the Elasticsearch client! It's really handy to setup my client to sniffOnStart and sniffOnConnectionFault like so:

esFactory({
  host: hosts[i],
  sniffOnStart: true,
  sniffInterval: 60000,
  sniffOnConnectionFault: true,
  suggestCompression: true,
  deadTimeout: 3000,
  requestTimeout: 3000
});

With this setup, I can kill nodes willy-nilly and watch my client automatically avoid the bad nodes without the user noticing anything other than a momentary glitch. Then when the node comes back up my clients will automatically see it after a minute and load balance back out. It's a super cool feature and I love it.

I feel like I'm 95% there, and the publish_host and publish_port settings were put in specifically to get me the other 5%, but publish_host is A) buggy and B) not quite powerful enough to support its primary use case.

Right now, as far as I can tell, setting publish_host to any value at all will break your system. I don't think that returning fqdn/ipaddress when you set the publish_host to fqdn is sane behavior. It would also be nice to have a publish_path or a publish_url setting so my elasticsearch nodes can tell clients exactly how to get to them.

If I'm wrong and this is intended behavior let me know. Otherwise I think I'll add a bug report and a feature request to the issue page.

It's not a bug, it's intended behaviour.

Is it a problem with the elasticsearch.angular.js library then? Should its sniff method be parsing the fqdn/ip combo?