I just built a plugin that provides a synonym token filter that loads terms
from a database and not from a file. One additional feature that I would
like to implement is the ability to reload the terms on demand.
I figured adding code on the client side Java API would be too much work
when the REST API will suffice for now. I created a REST action that
receives the call, but it is at this point where I do not know how to
continue. How can I execute a command on all nodes?
Jumping into the code, I see that the Client interface provides methods for
each concrete action (search, index, count, etc..) plus two generic
execute() methods. First of all, what should the workflow be? Second, what
do the specialized BroadcastOperation classes provide? Does the request
execute on every node? I do not see where the magic
happens. BroadcastOperation? For now, I have created my own Action,
Request, Response and RequestBuilder stubs, and everything compiles.
What is confusing is the I create a new Action, which contains a method
that creates a new RequestBuilder and is used as a param to
Client#execute(). This class in turn has a doExecute() method, which also
calls out to the Client. Am I missing something?
there are action levels, an action can be executed on node level, on
indices level, and on shard level
there are action classes (broadcast, master, node, replication, single
custom, single shard etc.)
there are actions for ingesting data (without Lucene code), for queries
(with Lucene query builders) and for adminstration (cluster and indices)
there is, for each action, a request/response class, with a request
builder class, a Transport...Action, and shard level request/response
in general, the control flow is moving the request to the indices level,
from there to the shard level, then responses are created and transported
back
the heavy lifting is done by a Transport...Action where different layers
are coordinated
such an action can be additionally exposed by the HTTP REST API
As there are so much ES actions, it might be easier to study plugins with
actions first, how they add their action to ES.
Don't get confused by the execute() / doExecute() methods, they just
organize the client API access, how clients should execute from a number of
selected actions in a registry. Each API call is implemented by a listener
call style and a prepare/execute style.
I hope this helps, feel free to ask for more details,
Much of what you have told me I already know! One thing that I did miss is
that each Action has a corresponding TransportAction that is binded in the
ActionModule. That important step was missing to me and not obvious.
What is still missing is how to differentiate between action levels. I have
written other plugins, but I have always interacted with the "specific"
methods defined in Client, never the generic execute(). Client.get(),
.index(), etc... all perform the actions at the necessary levels. Most
plugins work in this fashion.
In the end, there are far too many classes for such a simple feature. I am
just going to implement a timer thread that updates the synonyms at a
specific interval.
only for the sake of completeness, fetching data from external sources on a
regular basis is a common task for river plugins. So you might have a look
into the JDBC river plugin as well.
Cheers,
Jörg
On Friday, December 7, 2012 7:49:23 AM UTC+1, Ivan Brusic wrote:
Thanks Jörg.
Much of what you have told me I already know! One thing that I did miss is
that each Action has a corresponding TransportAction that is binded in the
ActionModule. That important step was missing to me and not obvious.
What is still missing is how to differentiate between action levels. I
have written other plugins, but I have always interacted with the
"specific" methods defined in Client, never the generic execute().
Client.get(), .index(), etc... all perform the actions at the necessary
levels. Most plugins work in this fashion.
In the end, there are far too many classes for such a simple feature. I am
just going to implement a timer thread that updates the synonyms at a
specific interval.
Cheers,
Ivan
On Thu, Dec 6, 2012 at 6:10 PM, Jörg Prante <joerg...@gmail.com<javascript:>
wrote:
there is, for each action, a request/response class, with a request
builder class, a Transport...Action, and shard level request/response
I wrote my first river plugin almost two years ago, so I know quite well
how they work. Every river plugin eventually calls client.index(...) and
uses the existing Action/Request/Response classes, so all the heavy lifting
is already done.
Besides, I am not updating an index, just some in-memory list (node-level).
As you mentioned, most of the work is done by the TransportAction classes.
The decision between using the appropriate service (cluster, transport,
index, script) lies there. Since most of the configuration happens via DI,
I completely missed the TransportAction classes.
But like I mentioned before, it is far too much code for a simple refresh.
A timer thread is not elegant, but it's only a few lines of code! I was
hoping for a less kludgy solution so I could open-source the filter. Then
again, the current file-based synonym/stopwords/keyword-marker/etc filters
are not refreshed either, so the problem is not a major one. But I would
still like to understand the class interactions, so I could perhaps write a
HOWTO.
only for the sake of completeness, fetching data from external sources on
a regular basis is a common task for river plugins. So you might have a
look into the JDBC river plugin as well.
Cheers,
Jörg
On Friday, December 7, 2012 7:49:23 AM UTC+1, Ivan Brusic wrote:
Thanks Jörg.
Much of what you have told me I already know! One thing that I did miss
is that each Action has a corresponding TransportAction that is binded in
the ActionModule. That important step was missing to me and not obvious.
What is still missing is how to differentiate between action levels. I
have written other plugins, but I have always interacted with the
"specific" methods defined in Client, never the generic execute().
Client.get(), .index(), etc... all perform the actions at the necessary
levels. Most plugins work in this fashion.
In the end, there are far too many classes for such a simple feature. I
am just going to implement a timer thread that updates the synonyms at a
specific interval.
Do you have any plans to open source this plugin? I was about to write a
plugin with the same functionality and I suspect it's something that may be
a fairly common request.
The plugin is quite simple, but I guess if there is a demand I could open
source it. There is some custom code needed for my company
(database connectivity) and it needs to be refactored. My ultimate goal is
to abstract the code to support all file-based token filters (eg:
stopwords).
My plan was to work on it a bit more during the Christmas slowdown, but if
you need something sooner, I can help you out. It basically comes down to
re-working SynonymTokenFilterFactory.
Do you have any plans to open source this plugin? I was about to write a
plugin with the same functionality and I suspect it's something that may be
a fairly common request.
Maybe the SKOS pluging can also give some insight.
It is kind of a synonym expander with controlled vocabulary from an
external semantic web resource. If you can wrap your synonyms into SKOS,
this plugin might be useful.
Cheers,
Jörg
On Thursday, December 13, 2012 6:35:53 PM UTC+1, Ivan Brusic wrote:
The plugin is quite simple, but I guess if there is a demand I could open
source it. There is some custom code needed for my company
(database connectivity) and it needs to be refactored. My ultimate goal is
to abstract the code to support all file-based token filters (eg:
stopwords).
My plan was to work on it a bit more during the Christmas slowdown, but if
you need something sooner, I can help you out. It basically comes down to
re-working SynonymTokenFilterFactory.
--
Ivan
On Thu, Dec 13, 2012 at 7:13 AM, Bruce Ritchie <bruce....@gmail.com<javascript:>
wrote:
Ivan,
Do you have any plans to open source this plugin? I was about to write a
plugin with the same functionality and I suspect it's something that may be
a fairly common request.
No immediately need, no. My timeline is 1st quarter next year.
Bruce
On Thu, Dec 13, 2012 at 12:35 PM, Ivan Brusic ivan@brusic.com wrote:
The plugin is quite simple, but I guess if there is a demand I could open
source it. There is some custom code needed for my company
(database connectivity) and it needs to be refactored. My ultimate goal is
to abstract the code to support all file-based token filters (eg:
stopwords).
My plan was to work on it a bit more during the Christmas slowdown, but if
you need something sooner, I can help you out. It basically comes down to
re-working SynonymTokenFilterFactory.
Do you have any plans to open source this plugin? I was about to write a
plugin with the same functionality and I suspect it's something that may be
a fairly common request.
My goal was to load synonyms from a database instead of a file, and not so
much what to load, but you made me think of another point.
If the Maven-Plugin project is a success (and it will be), then your plugin
could simply depend on the db-loader plugin I created, although I would not
know how to present skos files in a database. BTW, I might steal^H^H^H^H^H
borrow your SqlService class since mine is not generic (custom datasources).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.