Multicast removal

I just learned

So, on Linux, OpenJDK multicast implementation works fine, with IPv4, and also IPv6. Only on Mac OS X, multicast is broken (for ages, mostly because of crappy vendor support). The conclusion is to remove multicast completely from all platforms.

Well... multicast has advantages. Gaining some efficiency by reducing server load in local networks and eliminating redundant traffic is not such a bad idea at all. Beside that, I admit it's tough to get OpenJDK networking for Mac OS X fixed at upstream, but it's not totally hopeless. Blocking the broken multicast operations on Mac OS X for that (long) time would have been more than enough.

One of the features I liked of ES from the beginning in 2010 was the zero configuration setup. How about that: start a JVM and it suddenly becomes a cluster member - wow. This made the real difference. To use multicast IP, I convinced my network admin to disable IGMP snooping on the Cisco and create private subnets on VLANs, set up NICs with the correct scopes, and so on. I became a PITA. In the end, my DC was a multicast-friendly environment, able to run ES clusters with autodiscovery.

Maybe this is a situation where http://jgroups.org could get resurrected as a discovery plugin for the not-so-faint-hearted - it was used by kimchy back in the days before 0.7.0. Today, it also comes with a RAFT implementation http://belaban.github.io/jgroups-raft/ for experimenting with consensus algorithm.

Just my 2 ct.

Best, Jörg

Elasticsearch has taken the Apple approach of simplifying everything. If
something breaks for a customer, better off removing it. Most are
beneficial (limiting thread pools comes to mind), others are not. After a
hiatus from working with Elasticsearch, I am working on a new project that
uses it. There are so many features that I used before that are gone from
2.x. Zen discovery is still there?

I have used multicast with success in other projects. Ironically, the only
failure I had was with Elasticsearch when admins changed network settings
that they thought would not affect anyone. Unicast works great, but there
still is not ability to change the hosts without a restart.

I should have added that Elasticsearch is still a great product. Do not
want to appear too negative! Converted a co-worker who was a Solr user.

Ivan

So, on Linux, OpenJDK multicast implementation works fine, with IPv4, and also IPv6.

Except that it doesn't... lo1 doesn't support multicast on linux. (see elasticsearch listens on udp port 54328 on all network interfaces by default · Issue #12993 · elastic/elasticsearch · GitHub) The only reason multicast discovery worked on linux was because it listened on all (other) network interfaces. This prevented us from binding only to localhost, which was a much requested change. (How many times did your local node connect to a remote cluster unintentionally?)

Gaining some efficiency by reducing server load in local networks and eliminating redundant traffic is not such a bad idea at all.

We're talking about pings during discovery. That's all. And specifying a list of hosts limits the traffic pretty well.

Blocking the broken multicast operations on Mac OS X for that (long) time would have been more than enough.

We had plenty of people who had trouble getting multicast to work, and spent ages debugging such issues. Unicast is much simpler and much more reliable. We've sacrificed a little convenience for a whole lot more reliability and simplicity.

One of the features I liked of ES from the beginning in 2010 was the zero configuration setup.

And we've kept this behaviour for localhost. This still works. Sure, it doesn't when you move to a real cluster on separate machines, but one line of config is not too bad.

Elasticsearch has taken the Apple approach of simplifying everything. If something breaks for a customer, better off removing it.

If you look through the history of the multicast decision, you'll see that a significant effort went into trying to fix it before we came to the conclusion that it is unfixable. We made multicast a plugin to give existing users time to migrate. When I look at the download stats for the multicast plugin, almost nobody uses it. 25x more people use the ICU analysis plugin, which is pretty advanced!

The multicast plugin simply became technical debt, supporting a feature that a tiny fraction of users are using. There are a million interesting things that we'd like to do in Elasticsearch. Why waste developer resources on something that has so little uptake? We have to be pragmatic here.

You say that so much has been removed in 2.x. What you neglect to mention is all the new things that we have added, things that we wouldn't have been able to add and support if we hadn't gone back and cleaned up a whole lot of technical debt first.

Of course, removing the multicast plugin is the choice we have made based on our priorities. But this is open source. If multicast is one of your priorities then you're more than welcome to fork the plugin and maintain it yourself.

Unicast works great, but there still is not ability to change the hosts without a restart.

I've opened this issue to discuss a possible improvement to this situation:

Thanks Clinton. My new Elasticsearch project will be containerized, so
flexible hosts is a priority. Multicast was moved to a plugin for 2.0? With
1.x adoption still high from what I see on the mailing list, it could be a
reason why adoption of the plugin was low.

Multicast is popular with in-memory data grids. Shay was at Gigaspaces
before, so he definitely had the background when he created Elasticsearch.
Multicast does work. Lucene on Windows did not support mmap early on, but
that did not lead the Lucene group to abandon it.

I definitely will be asking a lot of potential deprecation questions at
Elasticon. I see some disturbing issues on Github. :slight_smile: There is a South
African restaurant run by a lovely woman from Durban near the conference,
I'll buy you lunch. :slight_smile:

Cheers,

Ivan

Looking forward to it :slight_smile: