I set up an index with -Des.index.storage.type=memory.
I added a bunch of documents.
Then, I killed elastic search
I brought it back up and queried the index - and the searches were
sucessful.
I believe there's a gap in my understanding. If the index is an in-
memory index - I thought there is no persistence in this case. But
seems like there is a built in layer of persistence. I use the default
configuration (did not set up a gateway, etc).
So the questions are:
Is there persistence for in-memory indices ? If not, how did my
example work ?
What happens if I go over the index limit for an in-memory index ?
Will ES gracefully transition between keeping things in RAM and going
to diske or will it die ?
How do I set up a hybrid where I keep a considerable amount of
stuff in RAM (cached) and go to diske when needed?
in-memory means that ES will nevertheless makes backup (via local
gateway) of the data to recover from that in a case of an incident.
But if the data does not fit completely into memory ES will throw at
some point OutOfMem errors. No 'transition' is done.
What you probably want is the default disc base index, which is also
in-memory for parts of the data (ES does a lot caching but also the
Operating system) and so it'll require less RAM and reads from disc if
the data is not cached
I set up an index with -Des.index.storage.type=memory.
I added a bunch of documents.
Then, I killed Elasticsearch
I brought it back up and queried the index - and the searches were
sucessful.
I believe there's a gap in my understanding. If the index is an in-
memory index - I thought there is no persistence in this case. But
seems like there is a built in layer of persistence. I use the default
configuration (did not set up a gateway, etc).
So the questions are:
Is there persistence for in-memory indices ? If not, how did my
example work ?
What happens if I go over the index limit for an in-memory index ?
Will ES gracefully transition between keeping things in RAM and going
to diske or will it die ?
How do I set up a hybrid where I keep a considerable amount of
stuff in RAM (cached) and go to diske when needed?
There is no persistency with in memory indices and local gateway, only with
shared gateway. Are you sure you got results? Also, did you do a full
cluster shutdown (if you are running more than one node)?
Note, there are bugs in the in memory (outside of JVM heap) store which I
did not manage yet to track down. Use the file system ones, or mmapfs, it
should be fast enough. If not, you can set the store to "ram" which defaults
to Lucene in memory (in heap) memory store.
in-memory means that ES will nevertheless makes backup (via local
gateway) of the data to recover from that in a case of an incident.
But if the data does not fit completely into memory ES will throw at
some point OutOfMem errors. No 'transition' is done.
What you probably want is the default disc base index, which is also
in-memory for parts of the data (ES does a lot caching but also the
Operating system) and so it'll require less RAM and reads from disc if
the data is not cached
I set up an index with -Des.index.storage.type=memory.
I added a bunch of documents.
Then, I killed Elasticsearch
I brought it back up and queried the index - and the searches were
sucessful.
I believe there's a gap in my understanding. If the index is an in-
memory index - I thought there is no persistence in this case. But
seems like there is a built in layer of persistence. I use the default
configuration (did not set up a gateway, etc).
So the questions are:
Is there persistence for in-memory indices ? If not, how did my
example work ?
What happens if I go over the index limit for an in-memory index ?
Will ES gracefully transition between keeping things in RAM and going
to diske or will it die ?
How do I set up a hybrid where I keep a considerable amount of
stuff in RAM (cached) and go to diske when needed?
There is no persistency with in memory indices and local gateway, only with
shared gateway.
After some digging deeper now I think I have a better understanding.
So @Vijay sorry for the confusion. @shay is the following assumption/
explanation correct?
When using local gateway the index is simply stored on disc and when
restarting the node it will use this information and the transaction
log to recover the index, right? If the index is stored in-memory then
there is no data to recover from.
When using shared gateway there is an additional storage (periodically
writing to this storage?). Then the node can simply recover the in-
memory index from that storage.
Now how does the transaction log comes into the game here? Does it
exist for in-memory indices at all?
One further question: to force Elasticsearch to recover from the local
disc for an in-memory index then I can simply use the shared gateway
with the default local work directory, right? I'm a bit confused of
the gateway naming (local vs. shared ...)
There is no persistency with in memory indices and local gateway, only with
shared gateway.
After some digging deeper now I think I have a better understanding.
So @Vijay sorry for the confusion. @shay is the following assumption/
explanation correct?
When using local gateway the index is simply stored on disc and when
restarting the node it will use this information and the transaction
log to recover the index, right? If the index is stored in-memory then
there is no data to recover from.
When using shared gateway there is an additional storage (periodically
writing to this storage?). Then the node can simply recover the in-
memory index from that storage.
Now how does the transaction log comes into the game here? Does it
exist for in-memory indices at all?
Yes, when using a shared gateway, periodically, the state of the indices are
persisted to the shared storage location. Transaction log is still in play
when using in memory indices since they play important part not just in
making sure indexed data does not get lost, but also when doing both shared
gateway persistency (snapshot), and when doing peer shard recovery.
There is no persistency with in memory indices and local gateway, only
with
shared gateway.
After some digging deeper now I think I have a better understanding.
So @Vijay sorry for the confusion. @shay is the following assumption/
explanation correct?
When using local gateway the index is simply stored on disc and when
restarting the node it will use this information and the transaction
log to recover the index, right? If the index is stored in-memory then
there is no data to recover from.
When using shared gateway there is an additional storage (periodically
writing to this storage?). Then the node can simply recover the in-
memory index from that storage.
Now how does the transaction log comes into the game here? Does it
exist for in-memory indices at all?
but also when doing both shared gateway persistency (snapshot)
But when doing local gateway (for an in memory index) there is no
snapshot thing, right?
Right, local gateway does not need to snapshot since it can recover from the
actual state of the indices.
and when doing peer shard recovery
What do you mean here?
When you increase the shards replicas, or shard migrate from one node to
another, they do recovery from one another. Thats what I mean by peer
recovery.
but also when doing both shared gateway persistency (snapshot)
But when doing local gateway (for an in memory index) there is no
snapshot thing, right?
Right, local gateway does not need to snapshot since it can recover from the
actual state of the indices.
and when doing peer shard recovery
What do you mean here?
When you increase the shards replicas, or shard migrate from one node to
another, they do recovery from one another. Thats what I mean by peer
recovery.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.