Pre-commit data transformation plugin

David_Richardson · August 26, 2011, 7:23pm

We need the ability to transform a document prior to commit - in this case
to create a structure to retain data revisions. Without using middleware.
I was pointed to the ability to write preIndex hooks on #elasticsearch and
given the Percolate module as an example.
I still have a number of specific questions about such an approach:

Is this the best template for understanding such functionality?
Has anyone done something similar and published the code?
How would such code be packaged? Percolate doesn't appear to be in
'plugin' format (if that's even the correct terminology). I do see a plugin
class when browsing the source, but Percolate doesn't appear to subclass
this.
Is there a formal structure for extensions to ES, even if unpublished?
I presume this is the case, and that plugins follow it.
How are external extensions registered for use? I presume the Plugins
folder is traversed and all .jars are loaded?
I recall seeing reference to Javascript for use elsewhere in ES - I
presume Rhino is being used. Is it then available for general use (i.e. as
the language to perform transformations pre-commit)? What version are we
using?
Modules vs Plugins. What's the difference?
The process of transforming the document will be a synchronous action.
Does this have non-obvious implications? Is there a timeout management
mechanism inherent in all request processing that would implicitly supervise
this?

Suggestions/help much appreciated.
david

kimchy · August 29, 2011, 6:04pm

Answers below, note though, with transformation, what you want to do is
perform the transformation onces, and then use the transformed data when
executing the index operation on both primary shard and replica shards.

In genreal though, I am not a fan of having such functionality in
elasticsearch, and think that it should be done outside of elasticsearch.
What I am concerned about is that this type of features tend to get misused
and abused very easily. Obviously, I won't stand in the way of huge demand
for it.

On Fri, Aug 26, 2011 at 10:23 PM, David Richardson <
david.richardson@enquora.com> wrote:

We need the ability to transform a document prior to commit - in this case
to create a structure to retain data revisions. Without using middleware.
I was pointed to the ability to write preIndex hooks on elasticsearch and
given the Percolate module as an example.
I still have a number of specific questions about such an approach:

Is this the best template for understanding such functionality?

I don't think it fits this case, because you want to perform the
"transformation" once. There isn't a good extension point in elasticsearch
to do that now.

Has anyone done something similar and published the code?

How would such code be packaged? Percolate doesn't appear to be in
'plugin' format (if that's even the correct terminology). I do see a plugin
class when browsing the source, but Percolate doesn't appear to subclass
this.

Check the other plugins in elasticsearch. Basically, they are zip files
that end up being extracted under the plugins directory. All jar / classes
files within that plugin directory under plugins are loaded automatically.

Is there a formal structure for extensions to ES, even if
unpublished? I presume this is the case, and that plugins follow it.

There is a mini format one. Check some of the plugins under elasticsearch
itself. As for what can be extended, that depends on the extension points /
guice injection options.

How are external extensions registered for use? I presume the Plugins
folder is traversed and all .jars are loaded?

Yes, it loads all the jar files, and calls the class listed in the
es-plugin.properties file.

I recall seeing reference to Javascript for use elsewhere in ES - I
presume Rhino is being used. Is it then available for general use (i.e. as
the language to perform transformations pre-commit)? What version are we
using?

The rhino version used is 1.7R2. The whole Scripting aspect is available
for general use using the ScriptingService, and its used in different
places.

Modules vs Plugins. What's the difference?

Modules are guice concepts. A plugin can have many modules.

The process of transforming the document will be a synchronous
action. Does this have non-obvious implications? Is there a timeout
management mechanism inherent in all request processing that would
implicitly supervise this?

Depends on where its plugged in, but, it will simply wait for it to finish.

Suggestions/help much appreciated.
david

David_Richardson · August 29, 2011, 10:44pm

Thks Shay. There'll be more questions but that helps.

david