Vector Modules going Apache->XPack License (despite community help)

Hi friends,

I am very happy with Elasticsearch, and very much <3 many of the people at the company. Smart and nice people through and through.

I have to admit though to hearing many feeling a little betrayed about the new vector implementation becoming an XPack module. Largely because it was developed in the open, Apache licensed code, with community input (including mine), then switched to a proprietary XPack license owned by Elastic. This feels quite cheeky.

I can only track my own participation in the discussions which started when discussing the potential for using delimited payload token filter as one way to represent vectors. There have been countless talks and informal discussions at Haystack and other conferences in the search relevance community about the need for vector with Elastic people, how they could be implemented, and the like. Many held back with implementations given the work being done by Elastic. Everyone was heartened to see Mayya's great work and contributed to the conversation.

I empathize that Elastic (and similar companies) are really in a tough bind in trying to monetize their open source, especially up against big bad AWS. And I'm not at all opposed to Elastic experimenting with things like the Elastic license to compete there, I'm not an OSI purist. I want Elastic to thrive as a company.

I think what IS a bit confusing is to have a community feeling like something will be developed under open source Apache - with the community participating in that spirit - and then switch to a proprietary module. Even if Basic means "free forever" it also has a clause in the license that as code is owned by Elastic it can be made non-free at any time based on my understanding of the Elastic license.

Hopefully this note comes across with lots of <3 for Elastic peeps. I genuinely think you've built something great to be proud of. But I do feel in this case a bit like there could be better communication about plans for new functionality being developed in the open.

19 Likes

Thanks for raising this one mate, we do appreciate feedback even if it's a bit uncomfortable to raise!

I'm not sure on the details of this myself, but someone is following up and will loop back around with a "proper" response :slight_smile:

2 Likes

First, thank you for the feedback here. As you know, we care deeply about our community and we appreciate the many ways that people contribute. We can't be successful without it. And this is a great example - when we engage on topics like this, we have an opportunity to clarify our current thinking and approach, and to learn and improve. I appreciate you recognizing the effort and balance that is necessary to make our users, products, and company successful, given the many challenges, including cloud providers. A big part of the motivation behind creating the Elastic License was to find a way to maintain and even expand our ability to engage with our community, while still having some aspects of protection.

We feel strongly that it’s not right to move a GA feature from open source to the Elastic license, and we have never done this. Nor have we taken a free feature and made it paid. In this case, the Vector field was (and still is) marked as experimental. While features are experimental, we document that the feature may change dramatically or even be removed entirely. We had a number of discussions as we were going towards the release around how the feature compatibility would work internally, and felt that one of those “dramatic” changes was needed in the licensing, so we took the opportunity as it is an experimental feature, to do so. That said, we understand how you feel about this and we regret that it happened - it was an unfortunate consequence. That is not how we want to cooperate with our community, and we are making an effort to prevent this from happening in the future, including having discussions much earlier in the development cycle.

Your feedback is appreciated and will also help direct our future decisions in these matters. We hope that we can continue to work with you and the broader community on this feature going forward - that's one of the benefits and goals with the Elastic License and it's why we opened the code.

16 Likes

Thanks Steve for the response

I think the implication (perhaps the 'problem' for Elastic to resolve) is if any code labeled experimental (so all new features) can later be put into XPack, then the community doesn't know if their efforts will be made proprietary or not. So that limits the enthusiasm for the community to contribute to new directions, limiting the 'product roadmap' to Elastic people. That's perfectly fine, but I suspect you guys might need to be very clear about that.

I think it's safer to err the other way for you guys. I want you guys to be successful, and like I said I'm not an OSI purist. Err towards developing something potentially proprietary behind closed doors or in Elastic licensed code, and only later open source it. This creates a clearer expectation. Don't go the other way Apache -> Elastic license. It's a better look for Elastic when you gift something to the community. Going the other way gives people a queasy feeling of "I just helped Elastic with their products, when I thought I was helping an Apache 2.0 licensed project" .

I worry for Elasticsearch also - I think there's a lot of downside to Elasticsearch longer-term not having a vector similarity feature as core functionality. In my work, relevance focused, it's becoming as important as TF*IDF. new solutions like GNES and Vespa have come onto the open source scene, and for search related functionality, it's extremely important to be able to perform a vector similarity. I hesitate to say I have the complete picture of your product roadmap, but the need for vector similarity is going to be so great that I can't imagine a search solution without it is going to survive for relevance-applications 5 years from now. I can be wrong of course, but that's my current point of view of where the search & applied information retrieval world is going

All the best!
-Doug

2 Likes

As a long time user and contributor, never have I considered an experimental feature one that can change license. In fact, I have contributed features that were commited as experimental. This is something contributors should know up front going forward and your definition of experimental should probably include this warning.

-Matt

4 Likes

Steve,

with the same respectful and constructive attitude of the original post message by Doug, i'd like to point that there were also other OSS to proprietary moves e.g. removing autocompletion from OSS to proprietary https://github.com/elastic/kibana/pull/20747 (sure it was experimental also, but quite a surprise as it was a big feature just announced in kibana OSS very shortly before)

The same process has then been happening at the entire platform level, with the creation of new components that are Elastic Licenced and that de facto replace previously fully OSS components

I think nobody objects or wants to threaten your commercial success and in fact the community is happy about it.

The disappointment comes as freedom of innovation is taken away.

Elasticsearch had been seen by many as the new OSS platform where state of the art information retrieval and distributed techniques were implemented and as such gained enormous momentum and excitement.

Many committed code, and time, provided suggestions etc.. and even those who did not contributed to the development directly have been "advocates" which led to Elastic.co success and all.

What should happen now, as features like this and others that have to do with data and not enterprise usage are closed?
Should there really be community driven "aggressive re-implementation of feature X and Y"?
Will Elastic then break plugin endpoints and then force people to a total fork?

I really hope this is not going to be the case and I wish Elastic worked together with the community to find a better way and bring the original excitement back

Example of this could be:

  • a pledge that the spirit of Elasticsearch as world best OSS search engine will remain such, with features like this being brought back to OSS as well as others which are not about "enterprise" use but just "use of the search/data analytics engine".
  • Allowing those who purchase Elastic subscription to run modified Elastic Licence code (within the limits of not altering the limitations of the subscription). I mean.. it was paid right? :slight_smile: .

Thanks for listening and would look forward to a dialog on this to happen.

4 Likes

I know that not everyone gets to see "behind the closed doors" of a company like Elastic, but I might shed some insights as to conversations we've been having here, since I think it may aid in transparency.

Before we even set off to have a proprietary "Elastic License", one set of conversations was that indeed, we wanted to be really obvious about our license allows/doesn't allow someone to do. For example, I know license.txt files are difficult for most of us in software development to parse, but we spent a lot of time honing our license to make sure it was very permissive relative to many "closed source licenses" and even include an e-mail address in the license if someone wanted sure and needed to get some validation that what they're doing is OK/not OK. I'm sure some people will argue that any non-OSI-approved license is out of scope for this conversation (and I'm not wanting to start a war on license files here), but I did want to point out that we went all the way down to thinking deeply about a very permissive/understandable license file relative to others I've seen, and we didn't do any of this until we felt that was in a pretty good place.

One of the things that permeates to the next level down of transparency is that we did want our development to happen in the open as much as possible. We used to have all of our proprietary code in a separate, private repository and we found that it was a bad experience for several stakeholders. Our users and customers didn't get to file/track issues themselves but instead had to rely on our support team to be a proxy. That was especially bad for the users of our free (but commercial, a.k.a. our Basic Licensed) software, which had to file issues on this forum which then went away into a private repository that they couldn't track, etc. It also led to development headaches internally: building source and CI from multiple repositories that had to be checked against one another. Simultaneously we also had explicit asks from several of our users / customers to contribute Elastic Licensed code, particularly when it integrates with another Elastic Licensed feature. I know that sounds funny for a lot of people, but there were a variety of instances of it, including additions to features that the user didn't want to maintain the code themselves long-term. We also get requests from users to audit our commercial code or have our software provided to a 3rd party software escrow, which caused another set of issues. Having our commercial (including the free) software open (and developed in the open) helps to achieve a great number of things with our users.

You've pointed out a potential downside that we recognized as well (and have attempted to prevent, which I'll talk about) to all of the code being open, which is that you could stumble on commercial code/development. We've striven to make our software as transparent and faithful to the community as possible through a variety of mechanisms to help avoid this problem. On the binary artifact side, we've continued to produce an Apache2 licensed build on a separate page/as a separate download that we continue to release (in various binary forms). On the code side, we've tried to deal with this by putting things inside of a dedicated directory (x-pack) so that you can easily exclude it if you want to.

The GitHub/discuss issue side is admittedly probably the hardest problem to deal with, and one we're thinking about a lot: if a community member comments on an issue and we're still in ideation stage on whether/how we'd do a feature, does that constitute that the feature needs to be OSS? If someone posts on our forums or talks to us at a conference about an idea? What if it's a commercial customer at a conference explicitly trying to ask for an enhancement request or a commercial customer that just doesn't care about the license? Ideas come from all over the place and I'm sure you can imagine the nuance to defining what conversations constitute significant contributions vs something that's "hard" like code. It's something that's a really tricky line to walk and there's a lot of grey area. To be clear, I'm not being defensive: it's an area I think we can still get better at to avoid confusion, and as Steve mentioned, this thread has started some great conversations -- both on this thread and at Elastic -- about how we could be more transparent and faithful to all parties. We're thinking about e.g. tagging on GitHub and other mechanisms we could potentially use to improve the awareness throughout the process. I hope this comes across as us not shutting down conversation but instead trying to improve. We've really been trying to be more transparent with everyone, though we've had some stumbles along the way (like all software). Changing the license of an experimental feature was really uncharacteristic for us -- I hope you notice that we don't have a history of "bait and switch" so to speak, but even so, as Steve mentioned, we're working on ways to prevent this type of thing from happening in the future.

In any case, having a free and open licensed tier has always carried the goal of having this type of transparency with the community: to help engage communication, get feedback on what parts of the software work and what needs improvement, to help speed up our users success, and to not gatekeep all of that behind a support team that required a paid subscription even for those that just want to use the free software. We've seen generally positive feedback from our users on this and a lot of engagement and excitement around being able to participate in features they used to have a bad experience in talking about, though I do fully get that not everyone wants to participate; that's where we'll continue to try to iterate to make things more transparent.

21 Likes

Very thoughtful response Shane, I really appreciate it. I'm glad Elastic has those concerns in mind. Certainly I think the issues you list could persist, so I look forward to seeing what you guys come up with.

Let me know if I can provide any ideas or feedback to you guys. I'm always happy to jump on a call and try to give an external perspective.

<3

3 Likes

Shane to honor the truth i have to say its a bit hard to see a licence which forbids you to run modified code even if you have paid for a licence as very permissive .. but this said it's your licence, your rules :slight_smile: , and agreed better than closed source.

On the other hand I would appreciate feedback on my message.. because you know what i mean, the community will not be without a OSS vector search implementation (as an example).

Could it be a win-win situation if core information retrieval features remained OSS by Elastic vs the OSS community having to looking elsewhere?

I believe clarity will help all.

Thanks, and again, with all my sincere appreciation and respect for the work so far.
Also as Doug said maybe having a call would be a good idea. Happy to talk.

2 Likes

Modifying the code was something we considered deeply, and it's a bit trickier than what may be obvious. One simple side is supportability: if a customer has made a modification to the code, it's very difficult for us as an organization to support them (e.g. someone rewrites the way that we store documents and then raises a support ticket on corruption or the like). But from a more IP protection perspective, it turns out it's pretty difficult to write an understandable and defensible license that says someone can modify the code but can't, e.g. try to make modifications to the code that invalidate the licensing of the commercial features. (It's partially because there are so many ways a user could modify code and do so in a way that may not be intended to circumvent licensing, but effectively may end up with doing so via a series of potentially even unrelated modifications.)

We do carefully consider what goes into Apache2 vs free-Elastic-licensed vs paid and this is a free-Elastic-licensed feature. We have had a goal of having a huge set of features (including the vector storage/search) remain free for use.

Thanks a lot Shane for the clarity on the vector search module decision. Cheers.