I just start using Elasticsearch and analyze its source code.
I need to modify its discovery system by registering the nodes on a kind of
database. So I suppose that it's easier to develop a plugin instead of
modifying the source itself.
But after reading some discovery plugins like the basic one Zen disocvery,
zookeeper discovery or cloud-aws plugin, I still don't understand some
things.
How to provide the discovered nodes to the elasticsearch core
programm? I think it uses the "DiscoveryNodes" class.
I saw that the different discovery classes use different elements for
their constructors but where are they defined?
I don't understand what the "ClusterState" is after reading the code
and why it is useful.
I didn't understand what the "AbstractLifecycleComponent" class
works. It seems to be like a thread or runnable class but it isn't and just
provides "doStart". What is it?
Is the publish action from master mandatory? Because, if I use a
database, I won't use it.
I didn't find enough documentation about the source code so a little help
would be great.
To implement this discovery plugin, I thought about a thread, which is
started be the discovery class, retrieving information from the database
periodically. But I don't know where I have to register the new nodes
discovered by this thread on the database (in the DiscoveryNodes class?).
Yes, discovered nodes are provided by the DiscoveryNode class
Can you please be more specific - what discovery class use what
constructors? Maybe the answer is in 4.
ClusterState keeps the current state of the cluster regarding nodes,
indices, mappings, and most important the elected leading node that has the
privilege to write the cluster state. The leading node in Elasticsearch is
called "master".
Yes, it is mandatory. Otherwise, nodes would not be able to receive
cluster state updates.
Are you sure you need another database? The ClusterState is already a mini
database, the content is written in SMILE encoded JSON to disk.
Note, that if you poll information from a remote database, you have to
implement failover and recovery if contact to the database is lost. Each
node carries a node identifier, so perhaps all you want to do is saving the
node identifier in the database (for whatever reason).
I just start using Elasticsearch and analyze its source code.
I need to modify its discovery system by registering the nodes on a kind
of database. So I suppose that it's easier to develop a plugin instead of
modifying the source itself.
But after reading some discovery plugins like the basic one Zen disocvery,
zookeeper discovery or cloud-aws plugin, I still don't understand some
things.
How to provide the discovered nodes to the elasticsearch core
programm? I think it uses the "DiscoveryNodes" class.
I saw that the different discovery classes use different elements
for their constructors but where are they defined?
I don't understand what the "ClusterState" is after reading the
code and why it is useful.
I didn't understand what the "AbstractLifecycleComponent" class
works. It seems to be like a thread or runnable class but it isn't and just
provides "doStart". What is it?
Is the publish action from master mandatory? Because, if I use a
database, I won't use it.
I didn't find enough documentation about the source code so a little help
would be great.
To implement this discovery plugin, I thought about a thread, which is
started be the discovery class, retrieving information from the database
periodically. But I don't know where I have to register the new nodes
discovered by this thread on the database (in the DiscoveryNodes class?).
I'm working in a distributed environment and some services will use the
elasticsearch service. So a kind of database is available to locate the
different needed services like elasticsearch.
I work on different network so the multicast can't be used. Can the unicast
ping discovery provide a discovery in this kind of organization?
If not the idea is to replace the ping discovery by accessing my database
to get the needed information for the discovery and detect the failure on
the different nodes (The database is in charge of the heartbeat of the
services and their timeout). In this case the "publish" step is not
mandatory for me and will decrease the cmmunication between nodes, which is
the goal of this kind of discovery. But I'm afraid that it will have a
consequence on the behavior of elasticsearch.
To conclude and summarize, to do that kind of discovery, I need to
"register" the discovered nodes in the "DiscoveryNodes" class and update
the "ClusterState". Is it exact?
Another little question (I didn't read yet the documentation you provide)
but if I must implement a kind of thread, do I need to make my class
inherit from the "AbstractLifeCycleComponent" or can I just use an
implementation of the "Runnable" interface of Java?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.