Manually attaching an index

If a full disk backup was made of an index directory (with ES
stopped), is there a way to manually attach or import it?

A related question is if there is a way to merge two indexes with the
same name, from two separate nodes/clusters? I could write a custom
script to read data from cluster 1 and index onto cluster 2, but would
be nice if there was a way to simply copy a few TB of data and
"import" or "attach" it.

There isn't a way to "import" a specific index, since the fact that its
"there" in the cluster is also stored in the metadata state of the cluster.
I agree, it would be nice to have some sort of notion of import / merge,
but, tricky to get the API right... (in a distributed env, i.e., where is
the data you plan to import from?)

On Mon, Aug 22, 2011 at 7:57 AM, Tom Le dottom@gmail.com wrote:

If a full disk backup was made of an index directory (with ES
stopped), is there a way to manually attach or import it?

A related question is if there is a way to merge two indexes with the
same name, from two separate nodes/clusters? I could write a custom
script to read data from cluster 1 and index onto cluster 2, but would
be nice if there was a way to simply copy a few TB of data and
"import" or "attach" it.

We are looking at more traditional IT scenarios, such as performing a disk
backup using SAN shadow copy (shut down ES, create snapshot, start ES - we
can copy hundreds of TB's very quickly).

We also want to move a lot of data from a single standalone node to another
for testing purposes. For example, I have a 10-TB of indexed data that I
maintain as the control data set, and I want multiple researchers to have
their own independent standalone node for testing. Each researcher needs a
different index, where each index is 1-TB in size. With traditional db's,
we'd export and import. For ES, I have to maintain 10 separate instances of
1-TB each so i can create a large tarball and ship it to a researcher. I'd
like to be able to "export" just specific indexes (making sure mappings and
all other settings are the same).

There was also a scenario where a client was pointing to the wrong ES
cluster and we indexed 300-GB of data (using a unique index name, so all the
data resided in one index). In this case, I did not have access to the raw
data and would have to write a script to query the cluster, and then
re-index to the correct cluster. An "export" feature here where I could
move an entire index by simply copying files would be nice.

On Mon, Aug 22, 2011 at 11:43 AM, Shay Banon kimchy@gmail.com wrote:

There isn't a way to "import" a specific index, since the fact that its
"there" in the cluster is also stored in the metadata state of the cluster.
I agree, it would be nice to have some sort of notion of import / merge,
but, tricky to get the API right... (in a distributed env, i.e., where is
the data you plan to import from?)

On Mon, Aug 22, 2011 at 7:57 AM, Tom Le dottom@gmail.com wrote:

If a full disk backup was made of an index directory (with ES
stopped), is there a way to manually attach or import it?

A related question is if there is a way to merge two indexes with the
same name, from two separate nodes/clusters? I could write a custom
script to read data from cluster 1 and index onto cluster 2, but would
be nice if there was a way to simply copy a few TB of data and
"import" or "attach" it.

When you view such an export feature, how do you see it working? Lets say
you have a cluster of 10s of nodes, with indices with 10s of TB of data. And
you want to export an index of several TBs of data, what would the best
process be for you?

The things to think about are the fact that those indices reside on several
different machines, so it needs to be taken into account...

On Tue, Aug 23, 2011 at 12:55 AM, Tom Le dottom@gmail.com wrote:

We are looking at more traditional IT scenarios, such as performing a disk
backup using SAN shadow copy (shut down ES, create snapshot, start ES - we
can copy hundreds of TB's very quickly).

We also want to move a lot of data from a single standalone node to another
for testing purposes. For example, I have a 10-TB of indexed data that I
maintain as the control data set, and I want multiple researchers to have
their own independent standalone node for testing. Each researcher needs a
different index, where each index is 1-TB in size. With traditional db's,
we'd export and import. For ES, I have to maintain 10 separate instances of
1-TB each so i can create a large tarball and ship it to a researcher. I'd
like to be able to "export" just specific indexes (making sure mappings and
all other settings are the same).

There was also a scenario where a client was pointing to the wrong ES
cluster and we indexed 300-GB of data (using a unique index name, so all the
data resided in one index). In this case, I did not have access to the raw
data and would have to write a script to query the cluster, and then
re-index to the correct cluster. An "export" feature here where I could
move an entire index by simply copying files would be nice.

On Mon, Aug 22, 2011 at 11:43 AM, Shay Banon kimchy@gmail.com wrote:

There isn't a way to "import" a specific index, since the fact that its
"there" in the cluster is also stored in the metadata state of the cluster.
I agree, it would be nice to have some sort of notion of import / merge,
but, tricky to get the API right... (in a distributed env, i.e., where is
the data you plan to import from?)

On Mon, Aug 22, 2011 at 7:57 AM, Tom Le dottom@gmail.com wrote:

If a full disk backup was made of an index directory (with ES
stopped), is there a way to manually attach or import it?

A related question is if there is a way to merge two indexes with the
same name, from two separate nodes/clusters? I could write a custom
script to read data from cluster 1 and index onto cluster 2, but would
be nice if there was a way to simply copy a few TB of data and
"import" or "attach" it.

In our case, we'd like to be able to

  1. lock the write to the index, assuming necessary for consistency
  2. copy all the shards into a single location (export) along with metadata
  3. copy them to new server if the location is not shared
  4. import the exported index into the new cluster

I realize it may not be as easy as it sounds due to distributed nature but I
think this is what's needed.

On Monday, August 29, 2011, Shay Banon kimchy@gmail.com wrote:

When you view such an export feature, how do you see it working? Lets say
you have a cluster of 10s of nodes, with indices with 10s of TB of data. And
you want to export an index of several TBs of data, what would the best
process be for you?
The things to think about are the fact that those indices reside on
several different machines, so it needs to be taken into account...

On Tue, Aug 23, 2011 at 12:55 AM, Tom Le dottom@gmail.com wrote:

We are looking at more traditional IT scenarios, such as performing a
disk backup using SAN shadow copy (shut down ES, create snapshot, start ES -
we can copy hundreds of TB's very quickly).
We also want to move a lot of data from a single standalone node to
another for testing purposes. For example, I have a 10-TB of indexed data
that I maintain as the control data set, and I want multiple researchers to
have their own independent standalone node for testing. Each researcher
needs a different index, where each index is 1-TB in size. With traditional
db's, we'd export and import. For ES, I have to maintain 10 separate
instances of 1-TB each so i can create a large tarball and ship it to a
researcher. I'd like to be able to "export" just specific indexes (making
sure mappings and all other settings are the same).
There was also a scenario where a client was pointing to the wrong ES
cluster and we indexed 300-GB of data (using a unique index name, so all the
data resided in one index). In this case, I did not have access to the raw
data and would have to write a script to query the cluster, and then
re-index to the correct cluster. An "export" feature here where I could
move an entire index by simply copying files would be nice.

On Mon, Aug 22, 2011 at 11:43 AM, Shay Banon kimchy@gmail.com wrote:

There isn't a way to "import" a specific index, since the fact that its
"there" in the cluster is also stored in the metadata state of the cluster.
I agree, it would be nice to have some sort of notion of import / merge,
but, tricky to get the API right... (in a distributed env, i.e., where is
the data you plan to import from?)

On Mon, Aug 22, 2011 at 7:57 AM, Tom Le dottom@gmail.com wrote:

If a full disk backup was made of an index directory (with ES
stopped), is there a way to manually attach or import it?

A related question is if there is a way to merge two indexes with the
same name, from two separate nodes/clusters? I could write a custom
script to read data from cluster 1 and index onto cluster 2, but would
be nice if there was a way to simply copy a few TB of data and
"import" or "attach" it.

--
Regards,
Berkay Mollamustafaoglu
Ph: +1 (571) 766-6292
mberkay on yahoo, google and skype