Pre-commit data transformation plugin


(David Richardson) #1

We need the ability to transform a document prior to commit - in this case
to create a structure to retain data revisions. Without using middleware.
I was pointed to the ability to write preIndex hooks on #elasticsearch and
given the Percolate module as an example.
I still have a number of specific questions about such an approach:

  • Is this the best template for understanding such functionality?
  • Has anyone done something similar and published the code?
  • How would such code be packaged? Percolate doesn't appear to be in
    'plugin' format (if that's even the correct terminology). I do see a plugin
    class when browsing the source, but Percolate doesn't appear to subclass
    this.
  • Is there a formal structure for extensions to ES, even if unpublished?
    I presume this is the case, and that plugins follow it.
  • How are external extensions registered for use? I presume the Plugins
    folder is traversed and all .jars are loaded?
  • I recall seeing reference to Javascript for use elsewhere in ES - I
    presume Rhino is being used. Is it then available for general use (i.e. as
    the language to perform transformations pre-commit)? What version are we
    using?
  • Modules vs Plugins. What's the difference?
  • The process of transforming the document will be a synchronous action.
    Does this have non-obvious implications? Is there a timeout management
    mechanism inherent in all request processing that would implicitly supervise
    this?

Suggestions/help much appreciated.
david


(Shay Banon) #2

Answers below, note though, with transformation, what you want to do is
perform the transformation onces, and then use the transformed data when
executing the index operation on both primary shard and replica shards.

In genreal though, I am not a fan of having such functionality in
elasticsearch, and think that it should be done outside of elasticsearch.
What I am concerned about is that this type of features tend to get misused
and abused very easily. Obviously, I won't stand in the way of huge demand
for it.

On Fri, Aug 26, 2011 at 10:23 PM, David Richardson <
david.richardson@enquora.com> wrote:

We need the ability to transform a document prior to commit - in this case
to create a structure to retain data revisions. Without using middleware.
I was pointed to the ability to write preIndex hooks on #elasticsearch and
given the Percolate module as an example.
I still have a number of specific questions about such an approach:

  • Is this the best template for understanding such functionality?

I don't think it fits this case, because you want to perform the
"transformation" once. There isn't a good extension point in elasticsearch
to do that now.

  • Has anyone done something similar and published the code?
  • How would such code be packaged? Percolate doesn't appear to be in
    'plugin' format (if that's even the correct terminology). I do see a plugin
    class when browsing the source, but Percolate doesn't appear to subclass
    this.

Check the other plugins in elasticsearch. Basically, they are zip files
that end up being extracted under the plugins directory. All jar / classes
files within that plugin directory under plugins are loaded automatically.

  • Is there a formal structure for extensions to ES, even if
    unpublished? I presume this is the case, and that plugins follow it.

There is a mini format one. Check some of the plugins under elasticsearch
itself. As for what can be extended, that depends on the extension points /
guice injection options.

  • How are external extensions registered for use? I presume the Plugins
    folder is traversed and all .jars are loaded?

Yes, it loads all the jar files, and calls the class listed in the
es-plugin.properties file.

  • I recall seeing reference to Javascript for use elsewhere in ES - I
    presume Rhino is being used. Is it then available for general use (i.e. as
    the language to perform transformations pre-commit)? What version are we
    using?

The rhino version used is 1.7R2. The whole Scripting aspect is available
for general use using the ScriptingService, and its used in different
places.

  • Modules vs Plugins. What's the difference?

Modules are guice concepts. A plugin can have many modules.

  • The process of transforming the document will be a synchronous
    action. Does this have non-obvious implications? Is there a timeout
    management mechanism inherent in all request processing that would
    implicitly supervise this?

Depends on where its plugged in, but, it will simply wait for it to finish.

Suggestions/help much appreciated.
david


(David Richardson) #3

Thks Shay. There'll be more questions but that helps.

david


(system) #4