New operations on indices

Hello.

What do You think about adding some new operations on elastic search
indices:

  1. copying (cloning) an index - making a copy of existing index under
    different name.
    Both old and new indices should be separate after cloning and changes
    to one of them shouldn't be propagated to other (that's difference
    with aliasing)
    Currently to achieve this functionality one should: create new index,
    iterate over types in old index and over documents in index/type and
    index all these documents to new index. For large indices it's very
    ineffective operation.
  2. Merging indices based on query or type - I'd like to index all my
    documents from one index matching some query to second index.

First operation is very important for me and probably not so hard to
implement. And second one would be a nice addition (for example for
joining two indices or moving a type from one index to another).

I've easily implemented the functionality with the help of the scan
search type:

indexing the full json into the source.

Although there is already a ticket for an optimized reindexing
functionality:

For large indices it's very ineffective operation.

But when not doing the ineffective version you cannot use the
functionality for reindexing.

Or why would you then need this clone feature?

On May 16, 12:12 pm, Wojciech Durczyński
wojciech.durczyn...@comarch.com wrote:

Hello.

What do You think about adding some new operations on Elasticsearch
indices:

  1. copying (cloning) an index - making a copy of existing index under
    different name.
    Both old and new indices should be separate after cloning and changes
    to one of them shouldn't be propagated to other (that's difference
    with aliasing)
    Currently to achieve this functionality one should: create new index,
    iterate over types in old index and over documents in index/type and
    index all these documents to new index. For large indices it's very
    ineffective operation.
  2. Merging indices based on query or type - I'd like to index all my
    documents from one index matching some query to second index.

First operation is very important for me and probably not so hard to
implement. And second one would be a nice addition (for example for
joining two indices or moving a type from one index to another).

I'd like to have one index - trunk and some other indices - branches.
Branches are initially copies of the trunk but they may be changed
separately and may be "committed" to trunk (merged) or deleted.

On May 16, 1:18 pm, Karussell tableyourt...@googlemail.com wrote:

I've easily implemented the functionality with the help of the scan
search type:

Elasticsearch Platform — Find real-time answers at scale | Elastic

indexing the full json into the source.

Although there is already a ticket for an optimized reindexing
functionality:

Reindex from _source by document ID or Query · Issue #492 · elastic/elasticsearch · GitHub

Add support to reindex into an http endpoint · Issue #514 · elastic/elasticsearch · GitHub

For large indices it's very ineffective operation.

But when not doing the ineffective version you cannot use the
functionality for reindexing.

Or why would you then need this clone feature?

On May 16, 12:12 pm, Wojciech Durczyñski

wojciech.durczyn...@comarch.com wrote:

Hello.

What do You think about adding some new operations on Elasticsearch
indices:

  1. copying (cloning) an index - making a copy of existing index under
    different name.
    Both old and new indices should be separate after cloning and changes
    to one of them shouldn't be propagated to other (that's difference
    with aliasing)
    Currently to achieve this functionality one should: create new index,
    iterate over types in old index and over documents in index/type and
    index all these documents to new index. For large indices it's very
    ineffective operation.
  2. Merging indices based on query or type - I'd like to index all my
    documents from one index matching some query to second index.

First operation is very important for me and probably not so hard to
implement. And second one would be a nice addition (for example for
joining two indices or moving a type from one index to another).

Yes, reindexing the full or part of the index into another index is planned. It gets tricky since some times (or most times, but not in your case) one might want to munge the data before reindexing.

Thats why the first priority was to expose the scan search type, so people can implement it themselves. Note, the scan search type supports providing a custom query (and not just match_all) so you can only scan a subset of the data and index it into another index.
On Monday, May 16, 2011 at 2:29 PM, Wojciech Durczyński wrote:

I'd like to have one index - trunk and some other indices - branches.
Branches are initially copies of the trunk but they may be changed
separately and may be "committed" to trunk (merged) or deleted.

On May 16, 1:18 pm, Karussell tableyourt...@googlemail.com wrote:

I've easily implemented the functionality with the help of the scan
search type:

Elasticsearch Platform — Find real-time answers at scale | Elastic

indexing the full json into the source.

Although there is already a ticket for an optimized reindexing
functionality:

Reindex from _source by document ID or Query · Issue #492 · elastic/elasticsearch · GitHub

Add support to reindex into an http endpoint · Issue #514 · elastic/elasticsearch · GitHub

For large indices it's very ineffective operation.

But when not doing the ineffective version you cannot use the
functionality for reindexing.

Or why would you then need this clone feature?

On May 16, 12:12 pm, Wojciech Durczyñski

wojciech.durczyn...@comarch.com wrote:

Hello.

What do You think about adding some new operations on Elasticsearch
indices:

  1. copying (cloning) an index - making a copy of existing index under
    different name.
    Both old and new indices should be separate after cloning and changes
    to one of them shouldn't be propagated to other (that's difference
    with aliasing)
    Currently to achieve this functionality one should: create new index,
    iterate over types in old index and over documents in index/type and
    index all these documents to new index. For large indices it's very
    ineffective operation.
  2. Merging indices based on query or type - I'd like to index all my
    documents from one index matching some query to second index.

First operation is very important for me and probably not so hard to
implement. And second one would be a nice addition (for example for
joining two indices or moving a type from one index to another).

one might want to munge the data before reindexing

yeah, I'm actually using scan type + fetching to e.g. clear a certain
field of the docs etc (or any 'merge' operation).

finally doing bulk indexing to a new copy of the index. Of course this
could be tuned to e.g. run without the http overhead (completely
lucene side)

but it works reasonable fast (~3000 docs/sec) even without any
profiling/optimization of my client code.