GSOC| Elasticsearch: mongoDB-ElasticSearch request translator. project discussion

Hi, ElasticSearch team!

I'm Denis Gorev, a 1st year CS phD student at MIPT, Moscow, Russian Federation. I would like to participate in GSOC with my own project, and would like to clarify if my idea is interesting for elastic team and suits GSOC limitations.

During my bachelor(four years) I was studying compilers structure and internals of interpreters, so i am quite familiar with parsing/mapping principals.

Idea:

In ruby driver(include Rails) for MongoDB it is possible to extract standardized query from request.
It looks like a dictionary with special keywords and easy to parse.

3 years ago I wrote translator from standardized MongoDB query to ElasticSearch query notation(can provide code on demand). It allowed company, which I worked at that time, to migrate from MongoDB to ElasticSearch without rewriting any of hundreds requests. There were some limitations of course(in nested object querying for example).

During GSOC I would like to update code(rewrite?) for new version of ElasticSearch and MongoDB, create a library(maybe ruby gem?) and make it open source. I would provide more details in proposal.

Currently, I'm working on my first PR (https://github.com/elastic/elasticsearch/issues/23294).
I have configured elasticsearch environment and now digging into source code.

This is an interesting idea and there might an overlap with the soon to be released "Elasticsearch SQL" feature (https://www.elastic.co/elasticon/conf/2018/sf/elasticsearch-sql for a quick overview). However, that uses a different tech stack — namely Java and ANTLR.

Unfortunately Elasticsearch SQL won't be available under a FLOSS license (code will be public on GitHub and it will be free to use for most use-cases, but there are some restrictions, which makes it unfit for GSoC). Though doing this project would IMO make most sense by building on other components that we have.

Additionally, we don't have a mentor for Ruby right now. Generally we only have the Elasticsearch client for Ruby / Rails and the APM agent in that area. So I'm not really convinced it fits in too well with the rest of the stack and would have a long-term future.

What do you think?

Philipp, thanks for your reply!

Elasticsearch SQL(ESQL) looks very inspiring! It would be great if I can indirectly participate in it. Do they have components that fit GSOC?

Programming languages and tech stack is not priority for me, I have already tried Java and Scala while playing with Hadoop and Spark. Not a professional, of course; Anyway there would be two months till GSOC starts, and I can spend them to become acquainted with suggested technologies.

MongoDB is not pure SQL, as you know. IMO in some cases it is easier to write requests in ORM-style opposite to SQL-style. So ESQL does not solve problems that I mentioned: huge code base with hardcoded MongoDB ORM requests. I found that pymongo also has this feature with standardized query.

To sum up:

I would be glad to participate in creating ESQL components, if it is possible.
If you think that it would be better to implement my project on other technologies, I am welcome for changes.

I don't think there is a good solution for this problem:

  1. We cannot extend Elasticsearch SQL in GSoC, because of the licensing. Though this is where it would make most sense IMO: Accept MongoDB's wire protocol, create the AST with ANTLR from the MongoDB queries, and map them to Elasticsearch queries.
  2. Maintaining integrations into different frameworks is nothing we (can) cover. You would need to build this integration for every programming language / framework, which doesn't sound very resource efficient (in comparison to provide the compatible interface on the server side once).

So ESQL does not solve problems that I mentioned: huge code base with hardcoded MongoDB ORM requests.

I'm not convinced mapping MongoDB queries to Elasticsearch for an application is the right thing to do. At best it's a migration scenario, but even that sounds rather risky.
There is a reason why Elasticsearch SQL is read-only and only targets specific use-cases like:

  • Data scientists who don't know the Elasticsearch query syntax and want to quickly gain insights.
  • BI tools for which you don't want to duplicate your data.

Thanks for comments and indication of weaknesses of my project.
Catch requests on wire layer is a great idea.
I will try to describe the project in the proposal and take your suggestions into account.

I can see how something like this can work if we implement it in java as a query plugin or an action plugin for elasticsearch. I think if we limit the scope to just queries, a query plugin without support for field projections or aggregations, a query plugin might work the best.

Your response sounds very promising!
Would you be so pleased to provide more technical details on your idea?
This would be a plugin for ElasticSearch directly?
It would accept query in MongoDB format?
Something like alternative API?

I appreciate any help

This would be a plugin for Elasticsearch directly?

yes, unless you have some other idea

It would accept query in MongoDB format?

correct

Something like alternative API?

I have been thinking, it should be possible to implement a plugin that adds a query to elasticsearch, that can accept a mongodb query and translate it into an elasticsearch query. So, something like this:

GET /_search
{
  "query": {
    "mongodb": {
      "status": "A",
      "qty": {
        "$lt": 30
      }
    }
  }
}

might be translated into

GET /_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "status": "D"
          }
        },
        {
          "range": {
            "qty": {
              "lt": 30
            }
          }
        }
      ]
    }
  }
}

It is possible to do this by creating a plugin that implements SearchPlugin, which will return a custom QuerySpec in getQueries() method. You can find an example of how this is implemented in ParentJoinPlugin.

@Igor_Motov, thanks for technical details!
I wrote a proposal and would be glad to participate in GSOC

So, gentlemen. Any updates? Can I translate any mongoDB query language string to elastic search now?

I'm afraid not with any of our tools and we also don't have it planned for now from what I know. So far we support SQL and EQL and we'll see about the next one(s).

I understand EQL and SQL are translate-able with the translate API. Is it possible to implement this translator in native ES SDK rather than as HTTP call? Just something out of topic but keep me wondering.