If a full disk backup was made of an index directory (with ES
stopped), is there a way to manually attach or import it?
A related question is if there is a way to merge two indexes with the
same name, from two separate nodes/clusters? I could write a custom
script to read data from cluster 1 and index onto cluster 2, but would
be nice if there was a way to simply copy a few TB of data and
"import" or "attach" it.
There isn't a way to "import" a specific index, since the fact that its
"there" in the cluster is also stored in the metadata state of the cluster.
I agree, it would be nice to have some sort of notion of import / merge,
but, tricky to get the API right... (in a distributed env, i.e., where is
the data you plan to import from?)
If a full disk backup was made of an index directory (with ES
stopped), is there a way to manually attach or import it?
A related question is if there is a way to merge two indexes with the
same name, from two separate nodes/clusters? I could write a custom
script to read data from cluster 1 and index onto cluster 2, but would
be nice if there was a way to simply copy a few TB of data and
"import" or "attach" it.
We are looking at more traditional IT scenarios, such as performing a disk
backup using SAN shadow copy (shut down ES, create snapshot, start ES - we
can copy hundreds of TB's very quickly).
We also want to move a lot of data from a single standalone node to another
for testing purposes. For example, I have a 10-TB of indexed data that I
maintain as the control data set, and I want multiple researchers to have
their own independent standalone node for testing. Each researcher needs a
different index, where each index is 1-TB in size. With traditional db's,
we'd export and import. For ES, I have to maintain 10 separate instances of
1-TB each so i can create a large tarball and ship it to a researcher. I'd
like to be able to "export" just specific indexes (making sure mappings and
all other settings are the same).
There was also a scenario where a client was pointing to the wrong ES
cluster and we indexed 300-GB of data (using a unique index name, so all the
data resided in one index). In this case, I did not have access to the raw
data and would have to write a script to query the cluster, and then
re-index to the correct cluster. An "export" feature here where I could
move an entire index by simply copying files would be nice.
On Mon, Aug 22, 2011 at 11:43 AM, Shay Banon kimchy@gmail.com wrote:
There isn't a way to "import" a specific index, since the fact that its
"there" in the cluster is also stored in the metadata state of the cluster.
I agree, it would be nice to have some sort of notion of import / merge,
but, tricky to get the API right... (in a distributed env, i.e., where is
the data you plan to import from?)
If a full disk backup was made of an index directory (with ES
stopped), is there a way to manually attach or import it?
A related question is if there is a way to merge two indexes with the
same name, from two separate nodes/clusters? I could write a custom
script to read data from cluster 1 and index onto cluster 2, but would
be nice if there was a way to simply copy a few TB of data and
"import" or "attach" it.
When you view such an export feature, how do you see it working? Lets say
you have a cluster of 10s of nodes, with indices with 10s of TB of data. And
you want to export an index of several TBs of data, what would the best
process be for you?
The things to think about are the fact that those indices reside on several
different machines, so it needs to be taken into account...
On Tue, Aug 23, 2011 at 12:55 AM, Tom Le dottom@gmail.com wrote:
We are looking at more traditional IT scenarios, such as performing a disk
backup using SAN shadow copy (shut down ES, create snapshot, start ES - we
can copy hundreds of TB's very quickly).
We also want to move a lot of data from a single standalone node to another
for testing purposes. For example, I have a 10-TB of indexed data that I
maintain as the control data set, and I want multiple researchers to have
their own independent standalone node for testing. Each researcher needs a
different index, where each index is 1-TB in size. With traditional db's,
we'd export and import. For ES, I have to maintain 10 separate instances of
1-TB each so i can create a large tarball and ship it to a researcher. I'd
like to be able to "export" just specific indexes (making sure mappings and
all other settings are the same).
There was also a scenario where a client was pointing to the wrong ES
cluster and we indexed 300-GB of data (using a unique index name, so all the
data resided in one index). In this case, I did not have access to the raw
data and would have to write a script to query the cluster, and then
re-index to the correct cluster. An "export" feature here where I could
move an entire index by simply copying files would be nice.
On Mon, Aug 22, 2011 at 11:43 AM, Shay Banon kimchy@gmail.com wrote:
There isn't a way to "import" a specific index, since the fact that its
"there" in the cluster is also stored in the metadata state of the cluster.
I agree, it would be nice to have some sort of notion of import / merge,
but, tricky to get the API right... (in a distributed env, i.e., where is
the data you plan to import from?)
If a full disk backup was made of an index directory (with ES
stopped), is there a way to manually attach or import it?
A related question is if there is a way to merge two indexes with the
same name, from two separate nodes/clusters? I could write a custom
script to read data from cluster 1 and index onto cluster 2, but would
be nice if there was a way to simply copy a few TB of data and
"import" or "attach" it.
When you view such an export feature, how do you see it working? Lets say
you have a cluster of 10s of nodes, with indices with 10s of TB of data. And
you want to export an index of several TBs of data, what would the best
process be for you?
The things to think about are the fact that those indices reside on
several different machines, so it needs to be taken into account...
On Tue, Aug 23, 2011 at 12:55 AM, Tom Le dottom@gmail.com wrote:
We are looking at more traditional IT scenarios, such as performing a
disk backup using SAN shadow copy (shut down ES, create snapshot, start ES -
we can copy hundreds of TB's very quickly).
We also want to move a lot of data from a single standalone node to
another for testing purposes. For example, I have a 10-TB of indexed data
that I maintain as the control data set, and I want multiple researchers to
have their own independent standalone node for testing. Each researcher
needs a different index, where each index is 1-TB in size. With traditional
db's, we'd export and import. For ES, I have to maintain 10 separate
instances of 1-TB each so i can create a large tarball and ship it to a
researcher. I'd like to be able to "export" just specific indexes (making
sure mappings and all other settings are the same).
There was also a scenario where a client was pointing to the wrong ES
cluster and we indexed 300-GB of data (using a unique index name, so all the
data resided in one index). In this case, I did not have access to the raw
data and would have to write a script to query the cluster, and then
re-index to the correct cluster. An "export" feature here where I could
move an entire index by simply copying files would be nice.
On Mon, Aug 22, 2011 at 11:43 AM, Shay Banon kimchy@gmail.com wrote:
There isn't a way to "import" a specific index, since the fact that its
"there" in the cluster is also stored in the metadata state of the cluster.
I agree, it would be nice to have some sort of notion of import / merge,
but, tricky to get the API right... (in a distributed env, i.e., where is
the data you plan to import from?)
If a full disk backup was made of an index directory (with ES
stopped), is there a way to manually attach or import it?
A related question is if there is a way to merge two indexes with the
same name, from two separate nodes/clusters? I could write a custom
script to read data from cluster 1 and index onto cluster 2, but would
be nice if there was a way to simply copy a few TB of data and
"import" or "attach" it.
--
Regards,
Berkay Mollamustafaoglu
Ph: +1 (571) 766-6292
mberkay on yahoo, google and skype
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.