Persistence explained


(Steff) #1

Extracted from
http://groups.google.com/group/elasticsearch/browse_thread/thread/cbd2cc71c407e435

Reading about persistence in ES I have a hard time figuring out exactly
how and when it works - node-local storage vs gateway-storage. Do you
have a pointer to a thorough description on how persistence work? I
general I want to make sure that data will be "persisted-persisted",
when an indexing-process (some code that I will write doing a number of
index-operations agains ES) has done a number of index-operation (maybe
bulk indexing) and it finishes. It isnt allowed to be possible that an
indexing-process believes that is has indexed a number of documents but
that they actually have not not been "persisted-persisted" yet. With
"persisted-persisted" I mean that no data will be lost even though all
nodes in the cluster will stop (e.g. due to global power-outage) a
split-sec after the process finished, or even though any single disk
will crash a split-sec after the process finished. So
"persisted-persisted" means stored on disk (will survive shutdown of
machine) - actually stored on at least two disks (redundant). I believe
I've heard one of the ES guys saying something about documents not being
"persisted-persisted" unless IndexWriter.commit (or something) of the
Lucene underneath has been called, and that IndexWriter.commit is called
asynchronously when ES see fit. If that is true I guess I need a
synchronous way through ES to make sure that this has happened. I also
heard that this operation is expensive, and that it should therefore not
be done too often. I need to make sure that it has been done when my
indexing-process finishes (call the operation as the last thing in the
process), but if it is expensive I guess I need to make sure that my
indexing-processes are not too small with respect to the number of
documents that they index. Any comments on that? What will a practical
lower limit on number of index-operations that have to be done between
IndexWriter.commits?

As I understand it, some information will be persisted "locally" on the
nodes and some information will be persisted in the gateway. Exactly
what kind of information will be persisted "locally" on the nodes and
what kind of information will be persisted in the gateway? E.g. is
document-information persisted both "locally" on the nodes and on the
gateway, or only "locally" on the nodes? It is persisted "locally" on
all the nodes running a replica of the shard containing the document, or
only on the node running the primary shard? Exactly when is information
written to disk (locally or in the gateway) - as part of the
"execute"-operation (e.g.
http://www.elasticsearch.org/guide/reference/java-api/index_.html), or
does it happen asynchronously to the "execute"-operation, so that there
is actually no guarantee that it has been written to disk when the
"execute"-operation returns (successfully)?

Regard, Per Steffensen


(Shay Banon) #2

Dear god, each question you send is 5000 words, when all of them can be
really be a one sentence question... . Its hard to answer those type of
questions, since they are not really questions... .

When indexing a document, the document gets indexed in a sync manner on a
shard and its replicas. Its also written to each shard local transaction log
to make sure it does not get lost.

With local gateway (the default, and you should use that), on a full cluster
restart, the state of the cluster, and the indices will be recovered based
on the data stored on each node.

On Mon, Sep 12, 2011 at 12:25 PM, Per Steffensen steff@designware.dkwrote:

Extracted from http://groups.google.com/group/elasticsearch/browse_
thread/thread/cbd2cc71c407e435http://groups.google.com/group/elasticsearch/browse_thread/thread/cbd2cc71c407e435

Reading about persistence in ES I have a hard time figuring out exactly how
and when it works - node-local storage vs gateway-storage. Do you have a
pointer to a thorough description on how persistence work? I general I want
to make sure that data will be "persisted-persisted", when an
indexing-process (some code that I will write doing a number of
index-operations agains ES) has done a number of index-operation (maybe bulk
indexing) and it finishes. It isnt allowed to be possible that an
indexing-process believes that is has indexed a number of documents but that
they actually have not not been "persisted-persisted" yet. With
"persisted-persisted" I mean that no data will be lost even though all nodes
in the cluster will stop (e.g. due to global power-outage) a split-sec after
the process finished, or even though any single disk will crash a split-sec
after the process finished. So "persisted-persisted" means stored on disk
(will survive shutdown of machine) - actually stored on at least two disks
(redundant). I believe I've heard one of the ES guys saying something about
documents not being "persisted-persisted" unless IndexWriter.commit (or
something) of the Lucene underneath has been called, and that
IndexWriter.commit is called asynchronously when ES see fit. If that is true
I guess I need a synchronous way through ES to make sure that this has
happened. I also heard that this operation is expensive, and that it should
therefore not be done too often. I need to make sure that it has been done
when my indexing-process finishes (call the operation as the last thing in
the process), but if it is expensive I guess I need to make sure that my
indexing-processes are not too small with respect to the number of documents
that they index. Any comments on that? What will a practical lower limit on
number of index-operations that have to be done between IndexWriter.commits?

As I understand it, some information will be persisted "locally" on the
nodes and some information will be persisted in the gateway. Exactly what
kind of information will be persisted "locally" on the nodes and what kind
of information will be persisted in the gateway? E.g. is
document-information persisted both "locally" on the nodes and on the
gateway, or only "locally" on the nodes? It is persisted "locally" on all
the nodes running a replica of the shard containing the document, or only on
the node running the primary shard? Exactly when is information written to
disk (locally or in the gateway) - as part of the "execute"-operation (e.g.
http://www.elasticsearch.org/**guide/reference/java-api/**index_.htmlhttp://www.elasticsearch.org/guide/reference/java-api/index_.html),
or does it happen asynchronously to the "execute"-operation, so that there
is actually no guarantee that it has been written to disk when the
"execute"-operation returns (successfully)?

Regard, Per Steffensen


(Steff) #3

Shay Banon skrev:

Dear god, each question you send is 5000 words, when all of them can
be really be a one sentence question... . Its hard to answer those
type of questions, since they are not really questions... .
Actually only 460 words (2930 characters) :slight_smile: It is not only one
question. There are many questions. I use the commonly accepted marker
for questions - namly the questionmark (?) - but you tend to overlook
them anyway. The reason for being so verbose is to make sure not to be
misunderstood with ambiguous one-sentence questions. Despite that you
tend to miss the point in some of the questions. I will have to be even
more verbose in the future :slight_smile: No, seriously, dont waste anymore of your
time answering my questions if you do not think you have the time for
reading and understanding them properly.

When indexing a document, the document gets indexed in a sync manner
on a shard and its replicas. Its also written to each shard local
transaction log to make sure it does not get lost.
Thanks.

With local gateway (the default, and you should use that), on a full
cluster restart, the state of the cluster, and the indices will be
recovered based on the data stored on each node.
Thanks. A small comment on why you recommend to use the default gateway,
please? We actually planed to use the Hadoop gateway since we will have
Hadoop running on the machines anyway.

On Mon, Sep 12, 2011 at 12:25 PM, Per Steffensen <steff@designware.dk
mailto:steff@designware.dk> wrote:

Extracted from
http://groups.google.com/group/elasticsearch/browse_thread/thread/cbd2cc71c407e435


Reading about persistence in ES I have a hard time figuring out
exactly how and when it works - node-local storage vs
gateway-storage. Do you have a pointer to a thorough description
on how persistence work?

No answer. Will assume that such a thorough description does not exist.

I general I want to make sure that data will be
"persisted-persisted", when an indexing-process (some code that I
will write doing a number of index-operations agains ES) has done
a number of index-operation (maybe bulk indexing) and it finishes.
It isnt allowed to be possible that an indexing-process believes
that is has indexed a number of documents but that they actually
have not not been "persisted-persisted" yet. With
"persisted-persisted" I mean that no data will be lost even though
all nodes in the cluster will stop (e.g. due to global
power-outage) a split-sec after the process finished, or even
though any single disk will crash a split-sec after the process
finished. So "persisted-persisted" means stored on disk (will
survive shutdown of machine) - actually stored on at least two
disks (redundant). I believe I've heard one of the ES guys saying
something about documents not being "persisted-persisted" unless
IndexWriter.commit (or something) of the Lucene underneath has
been called, and that IndexWriter.commit is called asynchronously
when ES see fit. If that is true I guess I need a synchronous way
through ES to make sure that this has happened. I also heard that
this operation is expensive, and that it should therefore not be
done too often. I need to make sure that it has been done when my
indexing-process finishes (call the operation as the last thing in
the process), but if it is expensive I guess I need to make sure
that my indexing-processes are not too small with respect to the
number of documents that they index. Any comments on that?

No comments on async persistence (calling of IndexWriter.commit). Will
asume there is no such thing happening, even though I believe I heard it
mentioned in the Berlin conference talk.

What will a practical lower limit on number of index-operations
that have to be done between IndexWriter.commits?

No answer, but not relevant if no async persistence is going on.

As I understand it, some information will be persisted "locally"
on the nodes and some information will be persisted in the
gateway. Exactly what kind of information will be persisted
"locally" on the nodes and what kind of information will be
persisted in the gateway?

No answer. I still have a problem understanding local persistence vs
gateway persistence. Maybe there is no such thing as local persistence
(only if the gateway is the default local), even though I would assume
that the Lucene index itself is persisted locally. I will make my own
tests and read the code to understand.

E.g. is document-information persisted both "locally" on the nodes
and on the gateway, or only "locally" on the nodes?

This was a followup question of the prior question. No anwser :frowning:

Is it persisted "locally" on all the nodes running a replica of
the shard containing the document, or only on the node running the
primary shard?

Got my answer. Thanks.

Exactly when is information written to disk (locally or in the
gateway) - as part of the "execute"-operation (e.g.
http://www.elasticsearch.org/guide/reference/java-api/index_.html),
or does it happen asynchronously to the "execute"-operation, so
that there is actually no guarantee that it has been written to
disk when the "execute"-operation returns (successfully)?

You say that the document get indexed in a sync manner, but you dont
mention what operation it is synch'ed with. I will assume that the
indexing and writing to the local transaction log will happen
synchronously in the "execute"-method. It would have been nice if that
was stated clearly though - especially when the question was so clear,
about what operation exactly does the actualy indexing and whether or
not it was done synchronously.

Regard, Per Steffensen

(Clinton Gormley) #4

Hi Per

Actually only 460 words (2930 characters) :slight_smile: It is not only one
question. There are many questions. I use the commonly accepted marker
for questions - namly the questionmark (?) - but you tend to overlook
them anyway. The reason for being so verbose is to make sure not to be
misunderstood with ambiguous one-sentence questions. Despite that you
tend to miss the point in some of the questions. I will have to be
even more verbose in the future :slight_smile: No, seriously, dont waste anymore
of your time answering my questions if you do not think you have the
time for reading and understanding them properly.

Don't be offended. Just be aware that your emails are very long, and
often cover ground that has already been covered in the mailing list
(and yes, I'm aware that this is not the easiest way to get the required
information quickly).

None of us is being paid to participate in this list, and we all have
day jobs which we have to attend to, so when we come across a very long
email, it is a sacrifice to sit down and answer it in detail.

Shay does an amazing job of (1) improving ElasticSearch on a daily basis
and (2) answering millions of questions on the list. He is obviously
under no obligation to answer all of these emails - he does it for the
love of his project.

Your questions are good, and useful to others, but they are time
consuming.

So as I said, don't be offended. Join us - become a volunteer and add to
our common knowledge. Scrape together the information that you can, and
add it do the website. You can fork the site here:
https://github.com/elasticsearch/elasticsearch.github.com

It would be appreciated by all of us.

clint


(Steff) #5

Clinton Gormley skrev:

Hi Per

Actually only 460 words (2930 characters) :slight_smile: It is not only one
question. There are many questions. I use the commonly accepted marker
for questions - namly the questionmark (?) - but you tend to overlook
them anyway. The reason for being so verbose is to make sure not to be
misunderstood with ambiguous one-sentence questions. Despite that you
tend to miss the point in some of the questions. I will have to be
even more verbose in the future :slight_smile: No, seriously, dont waste anymore
of your time answering my questions if you do not think you have the
time for reading and understanding them properly.

Don't be offended.
Im not. Just joking a little bit, because I thought that the respond was
a little harsh :slight_smile: I was not writing verbose questions to offend anyone

  • just to be sure that they where not misunderstood. But no holding
    grudges, from my side.

Just be aware that your emails are very long, and
often cover ground that has already been covered in the mailing list
(and yes, I'm aware that this is not the easiest way to get the required
information quickly).

You are right. Believe some effort should be put into merging answers
into documentation, so that they are not only to be found in the mailing
lists - because then you/we will probably receive the same questions
again and again. Alt least put together a FAQ with the most common
questions from newbies like me.

None of us is being paid to participate in this list, and we all have
day jobs which we have to attend to, so when we come across a very long
email, it is a sacrifice to sit down and answer it in detail.

I understand that. But then I would prefer it not being answered, or at
least answer it with a humble attitude a-la "I know this is not a
comprehensive answer, but this is what I have time to do for now".

Shay does an amazing job of (1) improving ElasticSearch on a daily basis
and (2) answering millions of questions on the list.
I dont question that - actually no doubt in my mind at all - he seems
like he has the right drive and the competencies to do stuff like this.
A little attitude problem maybe, but all really good
developers/architects (including me) has that problem :slight_smile: And from the
research I have done until now ES is still my prefered choice.
He is obviously
under no obligation to answer all of these emails - he does it for the
love of his project.

I understand. But either take the time to read and understand the
questions or dont answer at all.

Your questions are good, and useful to others, but they are time
consuming.

So as I said, don't be offended. Join us - become a volunteer and add to
our common knowledge. Scrape together the information that you can, and
add it do the website. You can fork the site here:
https://github.com/elasticsearch/elasticsearch.github.com

I probably will. At least I would like to get the time to.

It would be appreciated by all of us.

clint


(Shay Banon) #6

On Tue, Sep 13, 2011 at 12:02 PM, Per Steffensen steff@designware.dkwrote:

**
Shay Banon skrev:

Dear god, each question you send is 5000 words, when all of them can be
really be a one sentence question... . Its hard to answer those type of
questions, since they are not really questions... .

Actually only 460 words (2930 characters) :slight_smile: It is not only one question.
There are many questions. I use the commonly accepted marker for questions -
namly the questionmark (?) - but you tend to overlook them anyway. The
reason for being so verbose is to make sure not to be misunderstood with
ambiguous one-sentence questions. Despite that you tend to miss the point in
some of the questions. I will have to be even more verbose in the future :slight_smile:
No, seriously, dont waste anymore of your time answering my questions if you
do not think you have the time for reading and understanding them properly.

You write very long paragraph with multiple questions in them. By the time
one ends up reading the paragraph, you forget the questions that were asked
in the beginning. You somehow managed to break the paragraph into smaller
parts when answering, why not do it in the get go.

The other problem is that you repeat the same questions several times, and
manage to ask a question in a very convoluted manner. Maybe its the language
barrier, I don't know, but, you need to find a way to be more concise. The
fact that you get answers at all on this mailing list (compared to others,
where people will simply say, frack it, I am not going to spend time reading
all of this) is something that you should appreciate.

Writing longer text does not make your questions more understandable. You
questions are very simple (at least the ones you asked so far, and of
course, there is no problem with asking them). But, the amount of words you
put in each one, well, its strange...

When indexing a document, the document gets indexed in a sync manner on a
shard and its replicas. Its also written to each shard local transaction log
to make sure it does not get lost.

Thanks.

With local gateway (the default, and you should use that), on a full
cluster restart, the state of the cluster, and the indices will be recovered
based on the data stored on each node.

Thanks. A small comment on why you recommend to use the default gateway,
please? We actually planed to use the Hadoop gateway since we will have
Hadoop running on the machines anyway.

On Mon, Sep 12, 2011 at 12:25 PM, Per Steffensen steff@designware.dkwrote:

Extracted from
http://groups.google.com/group/elasticsearch/browse_thread/thread/cbd2cc71c407e435

Reading about persistence in ES I have a hard time figuring out exactly
how and when it works - node-local storage vs gateway-storage. Do you have a
pointer to a thorough description on how persistence work?

No answer. Will assume that such a thorough description does not exist.

http://www.elasticsearch.org/guide/reference/modules/gateway/local.html. For
you are asking for the nitty gritty details on how the actual recovery
works, thats a different question.

I general I want to make sure that data will be "persisted-persisted",

when an indexing-process (some code that I will write doing a number of
index-operations agains ES) has done a number of index-operation (maybe bulk
indexing) and it finishes. It isnt allowed to be possible that an
indexing-process believes that is has indexed a number of documents but that
they actually have not not been "persisted-persisted" yet. With
"persisted-persisted" I mean that no data will be lost even though all nodes
in the cluster will stop (e.g. due to global power-outage) a split-sec after
the process finished, or even though any single disk will crash a split-sec
after the process finished. So "persisted-persisted" means stored on disk
(will survive shutdown of machine) - actually stored on at least two disks
(redundant). I believe I've heard one of the ES guys saying something about
documents not being "persisted-persisted" unless IndexWriter.commit (or
something) of the Lucene underneath has been called, and that
IndexWriter.commit is called asynchronously when ES see fit. If that is true
I guess I need a synchronous way through ES to make sure that this has
happened. I also heard that this operation is expensive, and that it should
therefore not be done too often. I need to make sure that it has been done
when my indexing-process finishes (call the operation as the last thing in
the process), but if it is expensive I guess I need to make sure that my
indexing-processes are not too small with respect to the number of documents
that they index. Any comments on that?

No comments on async persistence (calling of IndexWriter.commit). Will
asume there is no such thing happening, even though I believe I heard it
mentioned in the Berlin conference talk.

It is "persisted-persisted". When you index a document, its there, safely
written to a transaction log (so no need to call IndexWriter#commit), and
replicated (in a sync manner by default) to all the shard replicas.

What will a practical lower limit on number of index-operations that

have to be done between IndexWriter.commits?

No answer, but not relevant if no async persistence is going on.

As I understand it, some information will be persisted "locally" on the
nodes and some information will be persisted in the gateway. Exactly what
kind of information will be persisted "locally" on the nodes and what kind
of information will be persisted in the gateway?

No answer. I still have a problem understanding local persistence vs
gateway persistence. Maybe there is no such thing as local persistence (only
if the gateway is the default local), even though I would assume that the
Lucene index itself is persisted locally. I will make my own tests and read
the code to understand.

The local gateway can recovery both the cluster state (which indices were
created, mappings) and the indices date from each node local storage. It
uses the local stored indices data to recovery itself, and a specially
placed files for the cluster metadata.

E.g. is document-information persisted both "locally" on the nodes and

on the gateway, or only "locally" on the nodes?

This was a followup question of the prior question. No anwser :frowning:

With local gateway, its reuses the same local index storage. It does not
need to copy it around.

Is it persisted "locally" on all the nodes running a replica of the

shard containing the document, or only on the node running the primary
shard?

Got my answer. Thanks.

Exactly when is information written to disk (locally or in the gateway)

You say that the document get indexed in a sync manner, but you dont
mention what operation it is synch'ed with. I will assume that the indexing
and writing to the local transaction log will happen synchronously in the
"execute"-method. It would have been nice if that was stated clearly though

  • especially when the question was so clear, about what operation exactly
    does the actualy indexing and whether or not it was done synchronously.

When you index a document, the call does not return until it has been
executed on all shards (sync replication). On each shard, it will index it
in Lucene, and add it to a transaction log.

Regard, Per Steffensen


(Lukáš Vlček) #7

Hi Per

On Tue, Sep 13, 2011 at 12:01 PM, Per Steffensen steff@designware.dkwrote:

**
Clinton Gormley skrev:

So as I said, don't be offended. Join us - become a volunteer and add to

our common knowledge. Scrape together the information that you can, and
add it do the website. You can fork the site here:https://github.com/elasticsearch/elasticsearch.github.com

I probably will. At least I would like to get the time to.

Just my note on the above, based on my personal experience contributing to
the documentation is probably one of the best efforts towards understanding
the big picture. And although it looks like it is more difficult compared to
writing an email it very well pays off and makes world a better place. Just
go for it :slight_smile:

Regards,
Lukas


(system) #8