Single node job

Hi,
I'm have to prepare a mechanism which is able to run a scheduled job in ES.
This job will be responsible for periodical remove of unused documents. I
think it is easy to achieve in simple plugin. The only problem I see is
synchronization. I think it is reasonable that at a time only one job is
running. Is there any way to do it with internal ES API (without using
external software like zookeeper)?

To answer possible suggestions. I don't want to and cannot use any external
tools and run it as cron jobs.

Any ideas?

--
Paweł Róg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF9ZkbMb1iQvB%3Dw1YsQ90aoTrFvWfGeBwmQLFcyVxmeJQn2yxw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

It depends on what you are exactly doing but there are document TTLs that
might suit, but they are resource intensive.

A plugin could work as you could leverage the quartz scheduler libraries to
handle running it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 27 August 2014 20:00, Pawel prog88@gmail.com wrote:

Hi,
I'm have to prepare a mechanism which is able to run a scheduled job in
ES. This job will be responsible for periodical remove of unused documents.
I think it is easy to achieve in simple plugin. The only problem I see is
synchronization. I think it is reasonable that at a time only one job is
running. Is there any way to do it with internal ES API (without using
external software like zookeeper)?

To answer possible suggestions. I don't want to and cannot use any
external tools and run it as cron jobs.

Any ideas?

--
Paweł Róg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAF9ZkbMb1iQvB%3Dw1YsQ90aoTrFvWfGeBwmQLFcyVxmeJQn2yxw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAF9ZkbMb1iQvB%3Dw1YsQ90aoTrFvWfGeBwmQLFcyVxmeJQn2yxw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZPJtbLKRmU87VxFKXwgjycjjUyXQ4A%3DKQG1vRgjPcYWA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi,
Thank you for your response but I think that's not the case.
TTL are not what I want to use. There is some logic which is used to make a
decision which documents are not active any more and should be remove (TTL
won't work here).

Quartz is only a scheduler and I don't think it is able to synchronize
distributed nodes (at least I don't quarts can do this). What i need is
something like "leader election" and only the leader triggers "delete"
jobs. When the leader dies, other node takes it's role. This is the reason
why I mentioned zookeeper but I don't want to use this - it is an external
software/moving part from ES point of view. ES also elects leader so I
thought I can use ES internal mechanisms.

--
Paweł

On Wed, Aug 27, 2014 at 12:13 PM, Mark Walkom markw@campaignmonitor.com
wrote:

It depends on what you are exactly doing but there are document TTLs that
might suit, but they are resource intensive.

A plugin could work as you could leverage the quartz scheduler libraries
to handle running it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 27 August 2014 20:00, Pawel prog88@gmail.com wrote:

Hi,
I'm have to prepare a mechanism which is able to run a scheduled job in
ES. This job will be responsible for periodical remove of unused documents.
I think it is easy to achieve in simple plugin. The only problem I see is
synchronization. I think it is reasonable that at a time only one job is
running. Is there any way to do it with internal ES API (without using
external software like zookeeper)?

To answer possible suggestions. I don't want to and cannot use any
external tools and run it as cron jobs.

Any ideas?

--
Paweł Róg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAF9ZkbMb1iQvB%3Dw1YsQ90aoTrFvWfGeBwmQLFcyVxmeJQn2yxw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAF9ZkbMb1iQvB%3Dw1YsQ90aoTrFvWfGeBwmQLFcyVxmeJQn2yxw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZPJtbLKRmU87VxFKXwgjycjjUyXQ4A%3DKQG1vRgjPcYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZPJtbLKRmU87VxFKXwgjycjjUyXQ4A%3DKQG1vRgjPcYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF9ZkbNWwQ9xdh%2Bwnop6M8LCJA0XL_ce3jUeEtOLnGOgZn3%2Big%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

With ScheduledThreadPoolExecutor of java.util.concurrent, you can set the
thread pool to 1 and this ensures the serial execution. No need for Quartz.

You are correct, you need to orchestrate the plugin execution over all the
nodes where it is installed to prevent multiple distributed executions. A
variant of this is to execute a plugin only on the master node. When
implementing a custom action, you can define on which node an action is
executed, e.g. on master only. See TransportMasterNodeOperationAction,
which is used for e.g. cluster state update operation that makes only sense
when being executed on the master node. Such a custom action can be
triggered internally by a ScheduledThreadPoolExecutor.

Just for the records:

Not sure why you deny cron jobs. This is the best method you can choose by
far. My opinion is that a plugin is too clumsy for simple purging tasks
(unless you have an easy method for dynamic config / update across ES
versions of a plugin). With a script from outside, wrapped into a cron
job, you are free to start/stop purging, config/update is much more
flexible, and there is no need for orchestration or master node selection.
It can also be maintained by non-Java developers/operators.

To avoid parallel execution you could easily use flock from util-linux

          • /usr/bin/flock -n /var/tmp/mydocpurge.lock
            /usr/local/bin/mydocpurge

In most cases, mydocpurge would consist of two curl executions, one for
searching the doc ids, then processing with jq into a JSON array of doc
ids, and the other curl call for doc deleting. Plus you can send email to
the admin.

My 2 cents.

Jörg

On Wed, Aug 27, 2014 at 12:00 PM, Pawel prog88@gmail.com wrote:

Hi,
I'm have to prepare a mechanism which is able to run a scheduled job in
ES. This job will be responsible for periodical remove of unused documents.
I think it is easy to achieve in simple plugin. The only problem I see is
synchronization. I think it is reasonable that at a time only one job is
running. Is there any way to do it with internal ES API (without using
external software like zookeeper)?

To answer possible suggestions. I don't want to and cannot use any
external tools and run it as cron jobs.

Any ideas?

--
Paweł Róg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAF9ZkbMb1iQvB%3Dw1YsQ90aoTrFvWfGeBwmQLFcyVxmeJQn2yxw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAF9ZkbMb1iQvB%3Dw1YsQ90aoTrFvWfGeBwmQLFcyVxmeJQn2yxw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEcEVbgKnQLaUK3HjyhWhJO6J7XZv6hW_OY1rwO0ADkkw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.