Creating a new plugin that works on the node (JVM) level.
The node that receives a request executes the request perfectly, but the
other nodes do not get the request.
In my TransportNodesOperationAction subclass, I implemented the two methods:
protected abstract NodeRequest newNodeRequest();
protected abstract NodeRequest newNodeRequest(String nodeId, Request
request);
Execute the action via:
client.admin().cluster().execute(MyAction.INSTANCE, request, new
ActionListener() { ...})
When calling the new ClusterAction, the first node executes the
newNodeRequest with the Request as a param. The other node gets the empty
call. The correct method call occurs inside the AsyncAction.start() method.
From my logging, my NodesOperationRequestBuilder never execute
its doExecute() method.
Not sure what I have missed. All classes subclass from the NodesOperation
heirarchy (except for the Action which subclasses ClusterAction).
executor() returns ThreadPool.Names.MANAGEMENT.
Serialization. Duh. I made some assumptions that simply were not true. I
will write-up my findings for anyone interested in writing node-level
plugins in the future.
--
Ivan
On Thu, Sep 19, 2013 at 10:38 AM, Ivan Brusic ivan@brusic.com wrote:
Is there anyone beside Jörg that knows elasticsearch internals?
On Wed, Sep 18, 2013 at 3:45 PM, Ivan Brusic ivan@brusic.com wrote:
I will write-up my findings for anyone interested in writing node-level
plugins in the future.
+1
btw, my colleague Vlastimil implemented plugin that might be worth looking
at [1]. I am not too familiar with its internals but IIRC he had to go down
the similar road.
My first plugin was a river plugin and that was three years ago. River
plugins are far easier since all the node communication is handled by the
RiverModule. Since then I have created other plugins (river and analysis),
but nothing at the node level.
Basically all data shared between nodes is serialized in the
readFrom/writeTo methods of the NodeOperationRequest object. Since this
class takes a NodesOperationRequest (note the different classname) as an
arg in the constructor, I assumed that it would correctly serialize the
request. However this obviously was not the case, which makes sense in
hindsight. How can my subclass be automatically serialized if I added new
fields? There are two different request classes and two different response
classes, so it was hard to spot the issue without stepping back and looking
at the class hierarchy as a whole.
I will write-up my findings for anyone interested in writing node-level
plugins in the future.
+1
btw, my colleague Vlastimil implemented plugin that might be worth looking
at [1]. I am not too familiar with its internals but IIRC he had to go down
the similar road.
yes I agree, but let me explain why I pointed out the JIRA river.
Rivers are relatively easy that is true - but only as long as you are fine
with its basic contract, i.e. start() and close() operations. Once you want
more fine grained control of your river then it gets more complicated. In
our rivers: JIRA river [1] and Remote river [2] (both are basically
similar, the former is just more specific to JIRA as a data source) we also
wanted to provide REST API for management of individual river instances and
reporting status [3]. Extending REST API is again simple - by registering
new actions on RestModule. But the question is what should happen when the
REST request is received on the node which is not running the river
singleton. Now, from what I have seen there are basically two approaches
how ppl do go about this.
The first one is to store client action as a document into cluster (for
example into _river index) and have the river check for such documents in
periodic intervals. The river uses the search API to learn if there was any
request to execute some specific action.
The second way (which we used) is a bit different. In nutshell it includes
full propagation from RestRequest to ClusterAction (and this includes
NodeOperationRequest/Response serialization as well). I think this provides
better options in terms of river control and reporting the status from
river.
May be there are better ways of doing this but as of now we do not know
about them. Also I am not aware of any other plugin that would do it in
similar way. It took Vlastimil quite some work to get there, one of the
biggest issues was missing Javadoc and relevant documentation.
So now I hope it makes more sense why I pointed you to the JIRA river
I did not dig deep enough into the code. I now see that the relevant
classes are inside the subfolders of mgm. Since there are many classes to
implement, once I did not see the explosion of classes in jira or mgm I
assumed it was just a standard river.
My implementation has been working smoothly, but I will still look at the
code to see if there is anything else that can be learned.
It did take quite some work to figure out since there is no documentation.
Have not seen any other plugins that work at the node level, at least
not at a glance (such as the JIRA river). I want to implement some of the
other types of TransportActions just to see how they behave and how they
differ from each other. For example, what is the difference
between TransportSingleCustomOperationAction
and TransportBroadcastOperationAction?
yes I agree, but let me explain why I pointed out the JIRA river.
Rivers are relatively easy that is true - but only as long as you are fine
with its basic contract, i.e. start() and close() operations. Once you want
more fine grained control of your river then it gets more complicated. In
our rivers: JIRA river [1] and Remote river [2] (both are basically
similar, the former is just more specific to JIRA as a data source) we also
wanted to provide REST API for management of individual river instances and
reporting status [3]. Extending REST API is again simple - by registering
new actions on RestModule. But the question is what should happen when the
REST request is received on the node which is not running the river
singleton. Now, from what I have seen there are basically two approaches
how ppl do go about this.
The first one is to store client action as a document into cluster (for
example into _river index) and have the river check for such documents in
periodic intervals. The river uses the search API to learn if there was any
request to execute some specific action.
The second way (which we used) is a bit different. In nutshell it includes
full propagation from RestRequest to ClusterAction (and this includes
NodeOperationRequest/Response serialization as well). I think this provides
better options in terms of river control and reporting the status from
river.
May be there are better ways of doing this but as of now we do not know
about them. Also I am not aware of any other plugin that would do it in
similar way. It took Vlastimil quite some work to get there, one of the
biggest issues was missing Javadoc and relevant documentation.
So now I hope it makes more sense why I pointed you to the JIRA river
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.