Elasticsearch - SQL query

The elasticsearch sql demo at Elasticon keynote was pretty interesting. how is it different from existing alternates like https://github.com/NLPchina/elasticsearch-sql Also, it will be great if you could share some links on this like docs/info/beta-version of the module as it will make regular adhoc queries very easy for my team

Hey @Larry_Durden thanks for asking.

I've seen this (and other) SQL implementations and they seem to all suffer from a few things.

First, many of them don't keep up with the Elastic release cycles. For example, at the time of this writing, Elastic has produced versions 5.1.2, 5.2.0, 5.2.1, and 5.2.2 none of which the NLPchina plugin has support for. They've also "skipped" some versions (2.0.1, 2.0.2, 2.2.2, 5.0.2) which means that it's impossible to run the latest versions of Elasticsearch with the plugin as well as some previous bug fix versions (some of which address some serious issues). We intend to produce SQL support in lockstep with Elasticsearch releases, which will allow anybody to run it, including on the latest Elastic versions.

Second, it just doesn't do things as "well" as we/some users we know would like. We're taking into account docvalues/text/keyword fields, automatically converting queries match or term where appropriate, using docvalue_fields vs _source.includes where appropriate, sorting on the .keyword value of a string instead of the analyzed text, and so on. In short, we're doing it "the right way" and so far I haven't personally seen any other SQL projects out there doing it "the right way." The limitations depend on your data/queries.

Lastly, we've seen thousands upon thousands of real support cases from real customers. We occasionally run into a customer that has installed one of a number of existing SQL projects asking about some sort of scalability/performance/security issue in their cluster and unfortunately when we've finally drilled down to the root-cause, the SQL plugin is almost always at fault and we have to ask the customer to uninstall the plugin. It's gotten so bad that unfortunately our support has had to treat these plugins essentially as universally unsupportable and when we see them in a customer deployment we typically ask them to remove them early on because they are known to cause serious issues down the line. It's obviously annoying to the customer and puts us in a weird situation when that's the case: telling someone that something they may be relying on for a legitimate business purpose is breaking their cluster and there's nothing that we can do about it short of rewriting that 3rd party plugin or trying to add a funky patch to it (which still wouldn't fix the root issue). And we know that in order to do it the right way so that we won't have supportability issues down the line, we'd have to start from the bottom up and design correctly rather than trying to patch existing projects with workarounds. So that led us to where we are now.

We're not yet to a beta version. We have a ways to go to make sure that it's getting the features and the standard extremely-robust-testing requirements that we have in the rest of Elasticsearch. At this point, we're seeking some feedback on use cases, including what you may be trying to integrate with, as well as potentially the types of queries you're trying to run so that we can take user behavior, requisite functions, and usage patterns into account

3 Likes

Thanks for the insightful comment.

I did not know that Elastic company tries to support third party plugins and seems to classify them as supportable, as unsupportable or even worse, universally unsupportable.

Is there a chance for plugin authors to learn about what the differences are and how to write "supportable plugins"?

How can plugin authors learn about they have created "root issues"? Is it a specific issue of that case, or a general problem?

I think the principle of least surprise is important here.

It would be beneficial to users, who could decide on time when installing plugins of third parties, if they got information about whether to continue or not. Maybe it could be helpful to present a curated list of plugins which are qualified as safe with regard to support by Elastic company. Or maybe a list of known risky plugins that will definitely lead to loss of support when being installed. Or maybe plugin authors could apply for plugin certifications in certain customer situations etc. etc.

@jprante we generally treat most plugins as "innocent until proven guilty." That is, in the event of some issue, our support generally ignores installed plugins until it seems something seems likely caused by one of them, at which point we ask a customer to reproduce the issue without the plugin so we can determine if it's an Elasticsearch issue or an issue with the 3rd party plugin. We can't directly "support" 3rd party plugins, as we don't own the code for them, so if we find a plugin is causing a problem, we generally either file an issue ourselves with the 3rd party plugin or ask the customer to do so.

There have, historically, been a very small set of plugins which we know cause problems more often than not, so I don't want this to seem like a bigger thing than it is. SQL plugins (especially those that try to do JOINs) have a tendency to cause cluster problems as do plugins that tried to override shard/balancing decisions (e.g. tempest) as have a few plugins that try to implement their own security mechanism.

The ideas you bring up are good ones. We've been thinking about this a few ways, including locking down the guts of Elasticsearch so that certain dangerous pieces can't be extended via plugins. I can see how it would be beneficial for a plugin author and the community to know "known issues" in a more visible way.

Thanks very insightful. The part i hate the most is lot of mandatory things like security which should have been part of core are buried as a commercial offering. Will it be the same with SQL?

Has it been decided if this plugin will be open source or not (I hope it will be...)?

Jeff

We aren't currently planning on making it open source. We are considering offering part of the functionality available in a free, basic license.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.