We need the ability to transform a document prior to commit - in this case
to create a structure to retain data revisions. Without using middleware.
I was pointed to the ability to write preIndex hooks on #elasticsearch and
given the Percolate module as an example.
I still have a number of specific questions about such an approach:
Is this the best template for understanding such functionality?
Has anyone done something similar and published the code?
How would such code be packaged? Percolate doesn't appear to be in
'plugin' format (if that's even the correct terminology). I do see a plugin
class when browsing the source, but Percolate doesn't appear to subclass
this.
Is there a formal structure for extensions to ES, even if unpublished?
I presume this is the case, and that plugins follow it.
How are external extensions registered for use? I presume the Plugins
folder is traversed and all .jars are loaded?
I recall seeing reference to Javascript for use elsewhere in ES - I
presume Rhino is being used. Is it then available for general use (i.e. as
the language to perform transformations pre-commit)? What version are we
using?
Modules vs Plugins. What's the difference?
The process of transforming the document will be a synchronous action.
Does this have non-obvious implications? Is there a timeout management
mechanism inherent in all request processing that would implicitly supervise
this?
Answers below, note though, with transformation, what you want to do is
perform the transformation onces, and then use the transformed data when
executing the index operation on both primary shard and replica shards.
In genreal though, I am not a fan of having such functionality in
elasticsearch, and think that it should be done outside of elasticsearch.
What I am concerned about is that this type of features tend to get misused
and abused very easily. Obviously, I won't stand in the way of huge demand
for it.
We need the ability to transform a document prior to commit - in this case
to create a structure to retain data revisions. Without using middleware.
I was pointed to the ability to write preIndex hooks on elasticsearch and
given the Percolate module as an example.
I still have a number of specific questions about such an approach:
Is this the best template for understanding such functionality?
I don't think it fits this case, because you want to perform the
"transformation" once. There isn't a good extension point in elasticsearch
to do that now.
Has anyone done something similar and published the code?
How would such code be packaged? Percolate doesn't appear to be in
'plugin' format (if that's even the correct terminology). I do see a plugin
class when browsing the source, but Percolate doesn't appear to subclass
this.
Check the other plugins in elasticsearch. Basically, they are zip files
that end up being extracted under the plugins directory. All jar / classes
files within that plugin directory under plugins are loaded automatically.
Is there a formal structure for extensions to ES, even if
unpublished? I presume this is the case, and that plugins follow it.
There is a mini format one. Check some of the plugins under elasticsearch
itself. As for what can be extended, that depends on the extension points /
guice injection options.
How are external extensions registered for use? I presume the Plugins
folder is traversed and all .jars are loaded?
Yes, it loads all the jar files, and calls the class listed in the
es-plugin.properties file.
I recall seeing reference to Javascript for use elsewhere in ES - I
presume Rhino is being used. Is it then available for general use (i.e. as
the language to perform transformations pre-commit)? What version are we
using?
The rhino version used is 1.7R2. The whole Scripting aspect is available
for general use using the ScriptingService, and its used in different
places.
Modules vs Plugins. What's the difference?
Modules are guice concepts. A plugin can have many modules.
The process of transforming the document will be a synchronous
action. Does this have non-obvious implications? Is there a timeout
management mechanism inherent in all request processing that would
implicitly supervise this?
Depends on where its plugged in, but, it will simply wait for it to finish.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.