Mapping changes and their affect on indexed data


(James Cook-3) #1

I'm using the S3 Gateway, but I think my use case is the same regardless of
my gateway type. Suppose I shut down my running cluster, and make some
rather complicated updates to my mapping files.

When I start up my cluster again, what happens to my indicies?


(Rafał Kuć) #2

Hello!

Your index structure won't be changed if that's what you ask about. New indices created will have new mappings. You can always try to update your existing mappings with http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html, some changes (like adding new fields) can be done.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

I'm using the S3 Gateway, but I think my use case is the same regardless of my gateway type. Suppose I shut down my running cluster, and make some rather complicated updates to my mapping files.

When I start up my cluster again, what happens to my indicies?


(James Cook-3) #3

Thanks. So what I am wondering is whether a feature might be added to
recover by:

  1. Access the _source property from documents already indexed and in the
    gateway.
  2. Reindex those documents using the new mappings

A lot of us are using ES more as a database, and I'm having to write my
'_source' out to another form of permanent storage and then recovering in
this manner when mapping files change or when a recovery fails for some
reason.

On Monday, May 7, 2012 1:53:07 PM UTC-4, Rafał Kuć wrote:

Hello!

Your index structure won't be changed if that's what you ask about. New
indices created will have new mappings. You can always try to update your
existing mappings with
http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html,http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html some
changes (like adding new fields) can be done.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

I'm using the S3 Gateway, but I think my use case is the same regardless
of my gateway type. Suppose I shut down my running cluster, and make some
rather complicated updates to my mapping files.

When I start up my cluster again, what happens to my indicies?


(Rafał Kuć) #4

Hello!

In theory that would work, but think about the following - you change the type of the field you facet on from string to integer. Some of the documents in the index have integer field, while some have string. This may cause problems for your queries. Maybe something like this would be an opion - create an alias which you use for querying on the original index. Then when you need to change your mappings totally, you create a new index with those mappings, index the data once again and then just change the alias to point to the new index instead of the old one. Of course use of such scenario depends on the volume of your data and such.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Thanks. So what I am wondering is whether a feature might be added to recover by:

  1. Access the _source property from documents already indexed and in the gateway.
  2. Reindex those documents using the new mappings
A lot of us are using ES more as a database, and I'm having to write my '_source' out to another form of permanent storage and then recovering in this manner when mapping files change or when a recovery fails for some reason.

On Monday, May 7, 2012 1:53:07 PM UTC-4, Rafał Kuć wrote:

Hello!

Your index structure won't be changed if that's what you ask about. New indices created will have new mappings. You can always try to update your existing mappings with http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html, some changes (like adding new fields) can be done.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

I'm using the S3 Gateway, but I think my use case is the same regardless of my gateway type. Suppose I shut down my running cluster, and make some rather complicated updates to my mapping files.

When I start up my cluster again, what happens to my indicies?


(Eric Jain) #5

On May 7, 10:59 am, James Cook jc...@pykl.com wrote:

Thanks. So what I am wondering is whether a feature might be added to
recover by:

  1. Access the _source property from documents already indexed and in the
    gateway.
  2. Reindex those documents using the new mappings

A lot of us are using ES more as a database, and I'm having to write my
'_source' out to another form of permanent storage and then recovering in
this manner when mapping files change or when a recovery fails for some
reason.

Support for schema migration would be great.

I'm curious, how often have people here experienced recovery failures
and have had to rebuild the data from an external source?


(James Cook-3) #6

In that workflow I was suggesting, no documents would have a field with
both string and integer types. The "old" index would have it as a String,
but the "new" index would have it as an integer. It wouldn't cause a
problem with querying data.

That isn't to say that a migration of previously indexed documents wouldn't
involve transformation. Any robust migration capability does.

-- jim

On Monday, May 7, 2012 2:16:12 PM UTC-4, Rafał Kuć wrote:

Hello!

In theory that would work, but think about the following - you change the
type of the field you facet on from string to integer. Some of the
documents in the index have integer field, while some have string. This may
cause problems for your queries. Maybe something like this would be an
opion - create an alias which you use for querying on the original index.
Then when you need to change your mappings totally, you create a new index
with those mappings, index the data once again and then just change the
alias to point to the new index instead of the old one. Of course use of
such scenario depends on the volume of your data and such.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

Thanks. So what I am wondering is whether a feature might be added to
recover by:

  1. Access the _source property from documents already indexed and in
    the gateway.
  2. Reindex those documents using the new mappings

A lot of us are using ES more as a database, and I'm having to write my
'_source' out to another form of permanent storage and then recovering in
this manner when mapping files change or when a recovery fails for some
reason.

On Monday, May 7, 2012 1:53:07 PM UTC-4, Rafał Kuć wrote:
Hello!

Your index structure won't be changed if that's what you ask about. New
indices created will have new mappings. You can always try to update your
existing mappings with http://www.elasticsearch.http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html
org/guide/reference/api/admin-http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html
indices-put-mapping.html,http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html some
changes (like adding new fields) can be done.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

I'm using the S3 Gateway, but I think my use case is the same regardless
of my gateway type. Suppose I shut down my running cluster, and make some
rather complicated updates to my mapping files.

When I start up my cluster again, what happens to my indicies?


(James Cook-3) #7

This isn't the most accurate list of people with these types of failures,
but it has happened before. Shay has plugged many of the issues you might
find on the old mailing list over time.

http://goo.gl/Gwo05

On Monday, May 7, 2012 3:00:00 PM UTC-4, Eric Jain wrote:

On May 7, 10:59 am, James Cook jc...@pykl.com wrote:

Thanks. So what I am wondering is whether a feature might be added to
recover by:

  1. Access the _source property from documents already indexed and in
    the
    gateway.
  2. Reindex those documents using the new mappings

A lot of us are using ES more as a database, and I'm having to write my
'_source' out to another form of permanent storage and then recovering
in
this manner when mapping files change or when a recovery fails for some
reason.

Support for schema migration would be great.

I'm curious, how often have people here experienced recovery failures
and have had to rebuild the data from an external source?


(Shay Banon) #8

The gateway (s3 and local) actually store Lucene indices... . Note, there
is a big difference between the shared gateway (s3) and local gateway,
where the shared gateway always have a "single" copy of the data (on s3)
and treats it as the master copy, and the local gateway uses each node
storage to recovery (so replicas you have, the better, since it can recover
from them as well).

In order to have a better story around proper backups, which you can do
with local gateway and backing up the data directory, I discussed
previously a backup/restore API option, so you can backup a point in time
state of an index (or the cluster), and then recover from it into a new
index, or existing index. This while using the local gateway, and for
example, doing point in time backups to s3.

On AWS, I recommend using the local gateway and not the s3 gateway, because
it has less resource overload on the system, and moves from the "single"
copy of master data (on s3).

On Mon, May 7, 2012 at 10:56 PM, James Cook jcook@pykl.com wrote:

This isn't the most accurate list of people with these types of failures,
but it has happened before. Shay has plugged many of the issues you might
find on the old mailing list over time.

http://goo.gl/Gwo05

On Monday, May 7, 2012 3:00:00 PM UTC-4, Eric Jain wrote:

On May 7, 10:59 am, James Cook jc...@pykl.com wrote:

Thanks. So what I am wondering is whether a feature might be added to
recover by:

  1. Access the _source property from documents already indexed and in
    the
    gateway.
  2. Reindex those documents using the new mappings

A lot of us are using ES more as a database, and I'm having to write my
'_source' out to another form of permanent storage and then recovering
in
this manner when mapping files change or when a recovery fails for some
reason.

Support for schema migration would be great.

I'm curious, how often have people here experienced recovery failures
and have had to rebuild the data from an external source?


(system) #9