Settings to create minimalistic embedded node

amjath_khan · October 31, 2012, 8:45am

As part of my application deployment in an application server, I am
creating an embedded Node. This Node will be used to perform index related
operations from my web application. As per the log, it takes around 5-8
secs to create and start a node, which includes the other entities like
modules, thread pools, etc. 1) Are all these modules and services required
for the Node ?
2) Is there a way to control the modules being initialized at a Node ? For
example, I do not want to start gc, what should i do ?

The Node will be used only to perform insert/update/delete operations on
the indices. It is created in the client mode ( no data will get stored )

--

Igor_Motov · November 1, 2012, 12:35am

I am somewhat confused which types of nodes you are creating and where. At
the beginning of your post you mentioned embedded node that is used for
indexing, but at the end of the email you described a client node that
doesn't store any data. Are these two different nodes? Where are they
running? How do they communicate? Could you provide any additional details
about your setup?

On Wednesday, October 31, 2012 4:45:37 AM UTC-4, amjath khan wrote:

As part of my application deployment in an application server, I am
creating an embedded Node. This Node will be used to perform index related
operations from my web application. As per the log, it takes around 5-8
secs to create and start a node, which includes the other entities like
modules, thread pools, etc. 1) Are all these modules and services required
for the Node ?
2) Is there a way to control the modules being initialized at a Node ? For
example, I do not want to start gc, what should i do ?

The Node will be used only to perform insert/update/delete operations on
the indices. It is created in the client mode ( no data will get stored )

--

jprante · November 1, 2012, 5:45pm

Hi,

I understand you want to connect from with a Java application server to a
remote Elasticsearch Cluster?

Your "embedded node" startup ( I think you refer to the node client, not
the transport client) takes some seconds because the cluster discovery
takes time. After that discovery (the zen discovery has 5 sec timeout), the
node has become part of the cluster. In the "client mode" you refer to, the
node is being made invisible to the other cluster members.

Quick answer to your questions: 1) Yes and no, it depends what you want to
do. You are right, not every threadpool is required for client node
operation. 2) The number of modules and services is not proportional to the
memory resource consumption, and therefore, you can't avoid gc by just
disabling modules or services. Memory pressure arises when you start
indexing and searching.

If I understand your question correctly, you are wondering about why
discovery takes place and why a client takes by default a lot of
functionality covered in services and modules, even when all you want to do
is insert/update/delete.

There are several options:

design a Settings parameter for minimal resource setup (not the empty
setting, unfortunately there is no written guide I know for what is the
most minimal setting, so you have to study the guide)
connect with a TransportClient. The difference to the NodeClient is that
a transport client is designed for remote access, it has no direct access
to the cluster state objects (mapping, indices), it manages cluster
connections explicitly by network addressing, and it organizes node
failover in the background
use HTTP REST from Java, e.g. by using the Jest client
GitHub - searchbox-io/Jest: Elasticsearch Java Rest Client.
use my experimental websocket client
https://github.com/jprante/elasticsearch-client-websocket (insert/delete
via bulk operations available, it requires the websocket transport
plugin GitHub - jprante/elasticsearch-transport-websocket: WebSockets for ElasticSearch)

With HTTP REST and the websocket client, there is no startup, no cluster
membership, no discovery, no plugins, no services at all. So you have to
manage the submission of actions, the evaluation of responses, and the
failover of the node connections by yourself. This is how script languages
like Perl/Python/Ruby connect to ES.

Best regards,

Jörg

On Wednesday, October 31, 2012 9:45:37 AM UTC+1, amjath khan wrote:

As part of my application deployment in an application server, I am
creating an embedded Node. This Node will be used to perform index related
operations from my web application. As per the log, it takes around 5-8
secs to create and start a node, which includes the other entities like
modules, thread pools, etc. 1) Are all these modules and services required
for the Node ?
2) Is there a way to control the modules being initialized at a Node ? For
example, I do not want to start gc, what should i do ?

The Node will be used only to perform insert/update/delete operations on
the indices. It is created in the client mode ( no data will get stored )

--

amjath_khan · November 6, 2012, 9:43am

Sorry For the delayed reply.
We are creating a node client in an application server as part of my
application deployment. The settings of the node client will make sure that
no shards gets allocated to my node ( Re-phrasing my confusing phrase at
the end of my last post ) . The node client gets connected to an
existing Elasticsearch cluster. This node is utilized to index my
application data in the Elasticsearch cluster.

We had some concern on the background activities performed by my node
client in the application server ( like zen discovery, gc, ping, ..). So,
we wanted to know, if we can disable some of the background activities,
which are not critical for my node client. As suggested by Jorg Prante, we
would try out the other clients.
Thanks

On Thursday, November 1, 2012 6:05:02 AM UTC+5:30, Igor Motov wrote:

I am somewhat confused which types of nodes you are creating and where. At
the beginning of your post you mentioned embedded node that is used for
indexing, but at the end of the email you described a client node that
doesn't store any data. Are these two different nodes? Where are they
running? How do they communicate? Could you provide any additional details
about your setup?

On Wednesday, October 31, 2012 4:45:37 AM UTC-4, amjath khan wrote:

As part of my application deployment in an application server, I am
creating an embedded Node. This Node will be used to perform index related
operations from my web application. As per the log, it takes around 5-8
secs to create and start a node, which includes the other entities like
modules, thread pools, etc. 1) Are all these modules and services required
for the Node ?
2) Is there a way to control the modules being initialized at a Node ?
For example, I do not want to start gc, what should i do ?

The Node will be used only to perform insert/update/delete operations on
the indices. It is created in the client mode ( no data will get stored )

--

amjath_khan · November 6, 2012, 9:46am

Hi Jörg,

You got my query correct.
We would try out the alternatives and post on the findings.

Thanks

On Thursday, November 1, 2012 11:15:24 PM UTC+5:30, Jörg Prante wrote:

Hi,

I understand you want to connect from with a Java application server to a
remote Elasticsearch Cluster?

Your "embedded node" startup ( I think you refer to the node client, not
the transport client) takes some seconds because the cluster discovery
takes time. After that discovery (the zen discovery has 5 sec timeout), the
node has become part of the cluster. In the "client mode" you refer to, the
node is being made invisible to the other cluster members.

Quick answer to your questions: 1) Yes and no, it depends what you want to
do. You are right, not every threadpool is required for client node
operation. 2) The number of modules and services is not proportional to the
memory resource consumption, and therefore, you can't avoid gc by just
disabling modules or services. Memory pressure arises when you start
indexing and searching.

If I understand your question correctly, you are wondering about why
discovery takes place and why a client takes by default a lot of
functionality covered in services and modules, even when all you want to do
is insert/update/delete.

There are several options:

design a Settings parameter for minimal resource setup (not the empty
setting, unfortunately there is no written guide I know for what is the
most minimal setting, so you have to study the guide)

connect with a TransportClient. The difference to the NodeClient is that
a transport client is designed for remote access, it has no direct access
to the cluster state objects (mapping, indices), it manages cluster
connections explicitly by network addressing, and it organizes node
failover in the background

use HTTP REST from Java, e.g. by using the Jest client
GitHub - searchbox-io/Jest: Elasticsearch Java Rest Client.

use my experimental websocket client
https://github.com/jprante/elasticsearch-client-websocket (insert/delete
via bulk operations available, it requires the websocket transport plugin
GitHub - jprante/elasticsearch-transport-websocket: WebSockets for ElasticSearch)

With HTTP REST and the websocket client, there is no startup, no cluster
membership, no discovery, no plugins, no services at all. So you have to
manage the submission of actions, the evaluation of responses, and the
failover of the node connections by yourself. This is how script languages
like Perl/Python/Ruby connect to ES.

Best regards,

Jörg

On Wednesday, October 31, 2012 9:45:37 AM UTC+1, amjath khan wrote:

As part of my application deployment in an application server, I am
creating an embedded Node. This Node will be used to perform index related
operations from my web application. As per the log, it takes around 5-8
secs to create and start a node, which includes the other entities like
modules, thread pools, etc. 1) Are all these modules and services required
for the Node ?
2) Is there a way to control the modules being initialized at a Node ?
For example, I do not want to start gc, what should i do ?

The Node will be used only to perform insert/update/delete operations on
the indices. It is created in the client mode ( no data will get stored )

--