I'm using the S3 Gateway, but I think my use case is the same regardless of
my gateway type. Suppose I shut down my running cluster, and make some
rather complicated updates to my mapping files.
When I start up my cluster again, what happens to my indicies?
I'm using the S3 Gateway, but I think my use case is the same regardless of my gateway type. Suppose I shut down my running cluster, and make some rather complicated updates to my mapping files.
When I start up my cluster again, what happens to my indicies?
Thanks. So what I am wondering is whether a feature might be added to
recover by:
Access the _source property from documents already indexed and in the
gateway.
Reindex those documents using the new mappings
A lot of us are using ES more as a database, and I'm having to write my
'_source' out to another form of permanent storage and then recovering in
this manner when mapping files change or when a recovery fails for some
reason.
On Monday, May 7, 2012 1:53:07 PM UTC-4, Rafał Kuć wrote:
I'm using the S3 Gateway, but I think my use case is the same regardless
of my gateway type. Suppose I shut down my running cluster, and make some
rather complicated updates to my mapping files.
When I start up my cluster again, what happens to my indicies?
In theory that would work, but think about the following - you change the type of the field you facet on from string to integer. Some of the documents in the index have integer field, while some have string. This may cause problems for your queries. Maybe something like this would be an opion - create an alias which you use for querying on the original index. Then when you need to change your mappings totally, you create a new index with those mappings, index the data once again and then just change the alias to point to the new index instead of the old one. Of course use of such scenario depends on the volume of your data and such.
Thanks. So what I am wondering is whether a feature might be added to recover by:
Access the _source property from documents already indexed and in the gateway.
Reindex those documents using the new mappings
A lot of us are using ES more as a database, and I'm having to write my '_source' out to another form of permanent storage and then recovering in this manner when mapping files change or when a recovery fails for some reason.
On Monday, May 7, 2012 1:53:07 PM UTC-4, Rafał Kuć wrote:
I'm using the S3 Gateway, but I think my use case is the same regardless of my gateway type. Suppose I shut down my running cluster, and make some rather complicated updates to my mapping files.
When I start up my cluster again, what happens to my indicies?
Thanks. So what I am wondering is whether a feature might be added to
recover by:
Access the _source property from documents already indexed and in the
gateway.
Reindex those documents using the new mappings
A lot of us are using ES more as a database, and I'm having to write my
'_source' out to another form of permanent storage and then recovering in
this manner when mapping files change or when a recovery fails for some
reason.
Support for schema migration would be great.
I'm curious, how often have people here experienced recovery failures
and have had to rebuild the data from an external source?
In that workflow I was suggesting, no documents would have a field with
both string and integer types. The "old" index would have it as a String,
but the "new" index would have it as an integer. It wouldn't cause a
problem with querying data.
That isn't to say that a migration of previously indexed documents wouldn't
involve transformation. Any robust migration capability does.
-- jim
On Monday, May 7, 2012 2:16:12 PM UTC-4, Rafał Kuć wrote:
Hello!
In theory that would work, but think about the following - you change the
type of the field you facet on from string to integer. Some of the
documents in the index have integer field, while some have string. This may
cause problems for your queries. Maybe something like this would be an
opion - create an alias which you use for querying on the original index.
Then when you need to change your mappings totally, you create a new index
with those mappings, index the data once again and then just change the
alias to point to the new index instead of the old one. Of course use of
such scenario depends on the volume of your data and such.
Thanks. So what I am wondering is whether a feature might be added to
recover by:
Access the _source property from documents already indexed and in
the gateway.
Reindex those documents using the new mappings
A lot of us are using ES more as a database, and I'm having to write my
'_source' out to another form of permanent storage and then recovering in
this manner when mapping files change or when a recovery fails for some
reason.
On Monday, May 7, 2012 1:53:07 PM UTC-4, Rafał Kuć wrote:
Hello!
I'm using the S3 Gateway, but I think my use case is the same regardless
of my gateway type. Suppose I shut down my running cluster, and make some
rather complicated updates to my mapping files.
When I start up my cluster again, what happens to my indicies?
This isn't the most accurate list of people with these types of failures,
but it has happened before. Shay has plugged many of the issues you might
find on the old mailing list over time.
Thanks. So what I am wondering is whether a feature might be added to
recover by:
Access the _source property from documents already indexed and in
the
gateway.
Reindex those documents using the new mappings
A lot of us are using ES more as a database, and I'm having to write my
'_source' out to another form of permanent storage and then recovering
in
this manner when mapping files change or when a recovery fails for some
reason.
Support for schema migration would be great.
I'm curious, how often have people here experienced recovery failures
and have had to rebuild the data from an external source?
The gateway (s3 and local) actually store Lucene indices... . Note, there
is a big difference between the shared gateway (s3) and local gateway,
where the shared gateway always have a "single" copy of the data (on s3)
and treats it as the master copy, and the local gateway uses each node
storage to recovery (so replicas you have, the better, since it can recover
from them as well).
In order to have a better story around proper backups, which you can do
with local gateway and backing up the data directory, I discussed
previously a backup/restore API option, so you can backup a point in time
state of an index (or the cluster), and then recover from it into a new
index, or existing index. This while using the local gateway, and for
example, doing point in time backups to s3.
On AWS, I recommend using the local gateway and not the s3 gateway, because
it has less resource overload on the system, and moves from the "single"
copy of master data (on s3).
On Mon, May 7, 2012 at 10:56 PM, James Cook jcook@pykl.com wrote:
This isn't the most accurate list of people with these types of failures,
but it has happened before. Shay has plugged many of the issues you might
find on the old mailing list over time.
Thanks. So what I am wondering is whether a feature might be added to
recover by:
Access the _source property from documents already indexed and in
the
gateway.
Reindex those documents using the new mappings
A lot of us are using ES more as a database, and I'm having to write my
'_source' out to another form of permanent storage and then recovering
in
this manner when mapping files change or when a recovery fails for some
reason.
Support for schema migration would be great.
I'm curious, how often have people here experienced recovery failures
and have had to rebuild the data from an external source?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.