Distributed locking within elasticsearch


(Steff) #1

Hi

Does elasticsearch nodes have some kind of distributed locking framework
running, so that they can take locks to prevent other processes
(potentially on other nodes) from getting that same lock, and therefore
preventing them from doing "interfereing" stuff at the same time? E.g.
something based on Hadoop ZooKeeper or HazelCast.

If yes, is the locking primitives somehow exposed so that I as a
elasticsearch user can use this same distributed locking framework? I
dont want to interfere with the potential locking done by elasticsearch
itself, but simply use the same distributed locking framework for my own
stuff (taking locks with key-names that elasticsearch does not use
itself), so that I do not have to setup my own distributed locking
framework.

Else I guess I could implement my own primitive distributed locking
framework based on the "unique constraint" feature of type/_id
(http://groups.google.com/group/elasticsearch/browse_thread/thread/51aeb1a9665d1f73)

  • like this:
    boolean getLock(String key) {
    <try to create/index a new document of type "locking" with _id=key
    and op_type=create. Return false if and only if creating/indexing the
    new document failed - probably because it already exists>
    }
    void releaseLock(String key) {
    <delete the document of type "locking" with _id=key. Only do this if
    you have previously successfully called getLock with the same key, of
    course>
    }
    Any comments on this?

Regards, Per Steffensen


(Karussell) #2

why not use the version + optimistic locking? what is your usecase?

On 20 Sep., 14:45, Per Steffensen st...@designware.dk wrote:

Hi

Does elasticsearch nodes have some kind of distributed locking framework
running, so that they can take locks to prevent other processes
(potentially on other nodes) from getting that same lock, and therefore
preventing them from doing "interfereing" stuff at the same time? E.g.
something based on Hadoop ZooKeeper or HazelCast.

If yes, is the locking primitives somehow exposed so that I as a
elasticsearch user can use this same distributed locking framework? I
dont want to interfere with the potential locking done by elasticsearch
itself, but simply use the same distributed locking framework for my own
stuff (taking locks with key-names that elasticsearch does not use
itself), so that I do not have to setup my own distributed locking
framework.

Else I guess I could implement my own primitive distributed locking
framework based on the "unique constraint" feature of type/_id
(http://groups.google.com/group/elasticsearch/browse_thread/thread/51a...)

  • like this:
    boolean getLock(String key) {
    <try to create/index a new document of type "locking" with _id=key
    and op_type=create. Return false if and only if creating/indexing the
    new document failed - probably because it already exists>}

void releaseLock(String key) {
<delete the document of type "locking" with _id=key. Only do this if
you have previously successfully called getLock with the same key, of
course>}

Any comments on this?

Regards, Per Steffensen


(Shay Banon) #3

There is no global locking mechanism in elasticseearch, since there isn't a
need for one. Cluster level changes (create an index, move shards around,
delete an index and so on) always go through the master of the cluster, and
shard level changes (index a document) only need "node" level concurrency
handling.

The code you posted should work to simulate a lock in elasticsearch, not
sure though if elasticsearch is the best component to do global locks for...

On Tue, Sep 20, 2011 at 3:45 PM, Per Steffensen steff@designware.dk wrote:

Hi

Does elasticsearch nodes have some kind of distributed locking framework
running, so that they can take locks to prevent other processes (potentially
on other nodes) from getting that same lock, and therefore preventing them
from doing "interfereing" stuff at the same time? E.g. something based on
Hadoop ZooKeeper or HazelCast.

If yes, is the locking primitives somehow exposed so that I as a
elasticsearch user can use this same distributed locking framework? I dont
want to interfere with the potential locking done by elasticsearch itself,
but simply use the same distributed locking framework for my own stuff
(taking locks with key-names that elasticsearch does not use itself), so
that I do not have to setup my own distributed locking framework.

Else I guess I could implement my own primitive distributed locking
framework based on the "unique constraint" feature of type/id (
http://groups.google.com/**group/elasticsearch/browse
**
thread/thread/51aeb1a9665d1f73http://groups.google.com/group/elasticsearch/browse_thread/thread/51aeb1a9665d1f73
**) - like this:
boolean getLock(String key) {
<try to create/index a new document of type "locking" with _id=key and
op_type=create. Return false if and only if creating/indexing the new
document failed - probably because it already exists>
}
void releaseLock(String key) {
<delete the document of type "locking" with _id=key. Only do this if you
have previously successfully called getLock with the same key, of course>
}
Any comments on this?

Regards, Per Steffensen


(Steff) #4

When the case is "updating" an existing document I will use version +
optimistic locking. The case here is that the document does not already
exist. Two independent processes might realize that "at the same time"
and therefore try to create it - but it is important that only one of
them succeeds. The processes will be able to agree on an id for the
document, so the "unique constraint" on id will help as long at they can
also agree on the index in which to create the new document. But in some
rare cases the processes might not agree into what index the new
document has to be created, and in those cases the "unique constraint"
on id will not help me - guess this "unique constraint" works per
index/type/id. In those cases I need to make sure that only one of the
processes will succeed, when they try to create a new document with the
same id, but in two different indexes - that is where I will make them
take the distributed lock on key=id (well not exactly, but close enough
for this forum). Afterwards the process that did not succeed will be
able to realize that the new document was created in another index than
what it expected, and it will be able to use that document from there on.

Regards, Per Steffensen

Karussell skrev:

why not use the version + optimistic locking? what is your usecase?

On 20 Sep., 14:45, Per Steffensen st...@designware.dk wrote:

Hi

Does elasticsearch nodes have some kind of distributed locking framework
running, so that they can take locks to prevent other processes
(potentially on other nodes) from getting that same lock, and therefore
preventing them from doing "interfereing" stuff at the same time? E.g.
something based on Hadoop ZooKeeper or HazelCast.

If yes, is the locking primitives somehow exposed so that I as a
elasticsearch user can use this same distributed locking framework? I
dont want to interfere with the potential locking done by elasticsearch
itself, but simply use the same distributed locking framework for my own
stuff (taking locks with key-names that elasticsearch does not use
itself), so that I do not have to setup my own distributed locking
framework.

Else I guess I could implement my own primitive distributed locking
framework based on the "unique constraint" feature of type/_id
(http://groups.google.com/group/elasticsearch/browse_thread/thread/51a...)

  • like this:
    boolean getLock(String key) {
    <try to create/index a new document of type "locking" with _id=key
    and op_type=create. Return false if and only if creating/indexing the
    new document failed - probably because it already exists>}

void releaseLock(String key) {
<delete the document of type "locking" with _id=key. Only do this if
you have previously successfully called getLock with the same key, of
course>}

Any comments on this?

Regards, Per Steffensen


(Steff) #5

Shay Banon skrev:

There is no global locking mechanism in elasticseearch, since there
isn't a need for one. Cluster level changes (create an index, move
shards around, delete an index and so on) always go through the master
of the cluster, and shard level changes (index a document) only need
"node" level concurrency handling.
Thanks. Kind of what I expected to hear, but just wanted to know for sure.

The code you posted should work to simulate a lock in elasticsearch,
not sure though if elasticsearch is the best component to do global
locks for...
Not sure I understand your last comment, but expect that you mean, that
doing global locks using elasticsearch in not the best solution. I know
that. But I might do it this way in the first go, so that I avoid
spending time on more sophisticated solutions.

On Tue, Sep 20, 2011 at 3:45 PM, Per Steffensen <steff@designware.dk
mailto:steff@designware.dk> wrote:

Hi

Does elasticsearch nodes have some kind of distributed locking
framework running, so that they can take locks to prevent other
processes (potentially on other nodes) from getting that same
lock, and therefore preventing them from doing "interfereing"
stuff at the same time? E.g. something based on Hadoop ZooKeeper
or HazelCast.

If yes, is the locking primitives somehow exposed so that I as a
elasticsearch user can use this same distributed locking
framework? I dont want to interfere with the potential locking
done by elasticsearch itself, but simply use the same distributed
locking framework for my own stuff (taking locks with key-names
that elasticsearch does not use itself), so that I do not have to
setup my own distributed locking framework.

Else I guess I could implement my own primitive distributed
locking framework based on the "unique constraint" feature of
type/_id
(http://groups.google.com/group/elasticsearch/browse_thread/thread/51aeb1a9665d1f73)
- like this:
boolean getLock(String key) {
  <try to create/index a new document of type "locking" with
_id=key and op_type=create. Return false if and only if
creating/indexing the new document failed - probably because it
already exists>
}
void releaseLock(String key) {
  <delete the document of type "locking" with _id=key. Only do
this if you have previously successfully called getLock with the
same key, of course>
}
Any comments on this?

Regards, Per Steffensen

(Steff) #6

Shay Banon skrev:

There is no global locking mechanism in elasticseearch, since there
isn't a need for one. Cluster level changes (create an index, move
shards around, delete an index and so on) always go through the master
of the cluster
I didnt know that there where a "master" in ES cluster, but basically I
cant say that I know much about ES - yet :slight_smile: Just out of curiosity, how
do the nodes agree who is master? Dont they need some kind of
"distributed election" (which will very likely be based on some kind of
"distributed locking" - or else "distributed election" can be used to
implement "distributed locking" :slight_smile: ) to decide who is going to be
master - at least when the master breaks down and the remaining nodes in
the cluster have to agree upon who of them will become the new master?
If the node playing the master role is not allowed to break down,
without the cluster loosing important funktionality (or stop working at
all), I believe ES has a HA problem.


(Pat Christopher) #7

You should check out ZooKeeper, an apache project. It was designed to solve
the distributed locking problem. Its fairly straightforward to setup and
has a good API.


(Steff) #8

Pat Christopher skrev:

You should check out ZooKeeper, an apache project. It was designed to
solve the distributed locking problem. Its fairly straightforward to
setup and has a good API.
Thanks. I know about ZooKeeper - I even mentioned it in my original
posting :slight_smile:


(Pat Christopher) #9

Ahh! Missed that line.

I do not think so. It does a broadcast for any hosts listening on :9300(I
think this is the default?), waits for the timeout then has a leader
election between all nodes that responded as part of the cluster. Or, if
broadcast is disabled, it uses a plugin to get the list of nodes to send a
packet to on :9300 to create the list of voting nodes. It elects the
leader, then all master messages get forwarded to the master. It also does
periodic ping checks to make sure the master lives and to get index/shard
mappings. If the master dies, it goes back to discovery phase to have a new
election.

At least, that is what I think it does. I've studied the code some in this
area but not exhaustively.


(Shay Banon) #10

You can implement leader election without doing distributed locking (or at
least what I think you mean when you refer to distributed locking, it has
become a pretty overloaded term now days).

You can also implemented leader election using a locking service, but then
you need to implement / use a distributed locking service :).

On Tue, Sep 20, 2011 at 8:11 PM, Per Steffensen steff@designware.dk wrote:

Shay Banon skrev:

There is no global locking mechanism in elasticseearch, since there isn't

a need for one. Cluster level changes (create an index, move shards around,
delete an index and so on) always go through the master of the cluster

I didnt know that there where a "master" in ES cluster, but basically I
cant say that I know much about ES - yet :slight_smile: Just out of curiosity, how do
the nodes agree who is master? Dont they need some kind of "distributed
election" (which will very likely be based on some kind of "distributed
locking" - or else "distributed election" can be used to implement
"distributed locking" :slight_smile: ) to decide who is going to be master - at least
when the master breaks down and the remaining nodes in the cluster have to
agree upon who of them will become the new master? If the node playing the
master role is not allowed to break down, without the cluster loosing
important funktionality (or stop working at all), I believe ES has a HA
problem.


(James Cook) #11

We use Hazelcast as a caching layer between our application and ES, for this
reason and so many other distributed functions. It has a simple MapStore
interface you can implement to enable an ES backend. Someone from Mozilla
metrics group was on here recently mentioning an opensource project which
when I checked it out had an implementation in their source code.


(Steff) #12

Shay Banon skrev:

You can implement leader election without doing distributed locking
(or at least what I think you mean when you refer to distributed
locking, it has become a pretty overloaded term now days).
Yes I know. Just wrote that it would very likely be based on some kind
of "distributed locking" - but maybe "very likely" was a little to much
to say. Guess you have implemented you own little ES alogritm for
distributed leader election, and thats fine. Just thought that you might
have used a common framework (like ZooKeeper or HazelCast) for such a
thing, and if you did and e.g. a ZooKeeper cluster was already running
"inside" ES, I would like to use that as well. But I will have to figure
something else out. Thanks for you response!

You can also implemented leader election using a locking service,
but then you need to implement / use a distributed locking service :).

On Tue, Sep 20, 2011 at 8:11 PM, Per Steffensen <steff@designware.dk
mailto:steff@designware.dk> wrote:

Shay Banon skrev:

    There is no global locking mechanism in elasticseearch, since
    there isn't a need for one. Cluster level changes (create an
    index, move shards around, delete an index and so on) always
    go through the master of the cluster

I didnt know that there where a "master" in ES cluster, but
basically I cant say that I know much about ES - yet :-) Just out
of curiosity, how do the nodes agree who is master? Dont they need
some kind of "distributed election" (which will very likely be
based on some kind of "distributed locking" - or else "distributed
election" can be used to implement "distributed locking" :-) ) to
decide who is going to be master - at least when the master breaks
down and the remaining nodes in the cluster have to agree upon who
of them will become the new master? If the node playing the master
role is not allowed to break down, without the cluster loosing
important funktionality (or stop working at all), I believe ES has
a HA problem.

(system) #13