[ANN] Elasticsearch reindex plugin

Karussell_2 · November 27, 2012, 2:35pm

Hi,

this is a plugin which wraps some 'reindex' functionality and executes this
on the server-side. This could be useful

if you want to change some index settings which are not updatable (like
shard count etc => reindexing into a new index)
or if you want to change some type settings (reindexing into the same
index)
or if you want to copy/update only specific data into another index =>
therefor you can specify a query (default is match_all)

Let me know if you have problems or suggestions!

Regards,
Peter.

--

dadoonet · November 27, 2012, 2:58pm

Hi Peter,

Cool plugin!
I think it's also relative to this issue:

github.com/elastic/elasticsearch

[Feature Request] Add a river to ElasticSearch instance

opened 08:41AM - 30 Jun 11 UTC

closed 02:23PM - 05 Apr 13 UTC

dadoonet

As discussed in the mailing list : http://elasticsearch-users.115913.n3.nabble.c…om/How-to-reindex-an-ES-index-tp3089964p3089964.html It would be nice to be able to reindex data from an ES instance using the `_source` field of previously stored documents. With it, we could : - Modify the mapping and ask for reindexing (even in the same cluster) documents stored in oldindex to a newindex index. The new mapping will be defined in newindex. - Migrate easily from an ES version to another if needed - Do many cool things that I can't imagine right now ;-) Thanks

I suppose that it only works in the same cluster and that _source must not be
disabled, isn't it?

Cheers
David.

Le 27 novembre 2012 à 15:35, Karussell tableyourtime@gmail.com a écrit :

Hi,

this is a plugin which wraps some 'reindex' functionality and executes this
on the server-side. This could be useful

if you want to change some index settings which are not updatable (like
shard count etc => reindexing into a new index)

or if you want to change some type settings (reindexing into the same
index)

or if you want to copy/update only specific data into another index =>
therefor you can specify a query (default is match_all)

GitHub - karussell/elasticsearch-reindex: Simple re-indexing. To backup, apply index settings changes and more ElasticMagic

Let me know if you have problems or suggestions!

Regards,
Peter.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Karussell_2 · November 27, 2012, 3:10pm

Hey David,

yes, _source cannot be disabled and also it only works in the same cluster.
But as one could use the code in a pure java application (like I was doing
before) or in a river (like you are proposing in the issue) one can then
reindex into a different cluster too.

Regads,
Peter.

On Tuesday, November 27, 2012 3:58:46 PM UTC+1, David Pilato wrote:

Hi Peter,

Cool plugin!
I think it's also relative to this issue:
[Feature Request] Add a river to ElasticSearch instance · Issue #1077 · elastic/elasticsearch · GitHub

I suppose that it only works in the same cluster and that _source must
not be disabled, isn't it?

Cheers
David.

Le 27 novembre 2012 à 15:35, Karussell <tabley...@gmail.com <javascript:>>
a écrit :

Hi,

this is a plugin which wraps some 'reindex' functionality and executes
this on the server-side. This could be useful

if you want to change some index settings which are not updatable
(like shard count etc => reindexing into a new index)

or if you want to change some type settings (reindexing into the same
index)

or if you want to copy/update only specific data into another index =>
therefor you can specify a query (default is match_all)

GitHub - karussell/elasticsearch-reindex: Simple re-indexing. To backup, apply index settings changes and more ElasticMagic

Let me know if you have problems or suggestions!

Regards,
Peter.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

dadoonet · November 27, 2012, 3:21pm

Oh yes ! I will fork it

My only concern with the river is that nodes could be incompatible from a
cluster to another one.
That's one of the reason I did not digg into before.
But now, there are some pure REST interfaces and I probaly can use JEST [1] for
example to fetch content from another cluster (I did not check if scan & scroll
API is available from JEST).

Also, it's perhaps a nonsense to consider it as a river and not as an
administrative tool (as you said : in a pure java application).

Regards

[1] GitHub - searchbox-io/Jest: Elasticsearch Java Rest Client.

Le 27 novembre 2012 à 16:10, Karussell tableyourtime@gmail.com a écrit :

Hey David,

yes, _source cannot be disabled and also it only works in the same cluster.
But as one could use the code in a pure java application (like I was doing
before) or in a river (like you are proposing in the issue) one can then
reindex into a different cluster too.

Regads,
Peter.

On Tuesday, November 27, 2012 3:58:46 PM UTC+1, David Pilato wrote:
Hi Peter,

Cool plugin!
I think it's also relative to this issue:
[Feature Request] Add a river to ElasticSearch instance · Issue #1077 · elastic/elasticsearch · GitHub
https://github.com/elasticsearch/elasticsearch/issues/1077

I suppose that it only works in the same cluster and that _source must
not be disabled, isn't it?

Cheers
David.

Le 27 novembre 2012 à 15:35, Karussell <
https://github.com/elasticsearch/elasticsearch/issues/1077
tabley...@gmail.com> a écrit :
> > > Hi,
this is a plugin which wraps some 'reindex' functionality and executes
this on the server-side. This could be useful
 * if you want to change some index settings which are not updatable
(like shard count etc => reindexing into a new index)
* or if you want to change some type settings (reindexing into the
same index)
* or if you want to copy/update only specific data into another index
=> therefor you can specify a query (default is match_all)
https://github.com/karussell/elasticsearch-reindex
https://github.com/karussell/elasticsearch-reindex
Let me know if you have problems or suggestions!

Regards,
Peter.



--

  <https://github.com/karussell/elasticsearch-reindex>
 <https://github.com/karussell/elasticsearch-reindex>
https://github.com/karussell/elasticsearch-reindex
--
David Pilato
https://github.com/karussell/elasticsearch-reindex
http://www.scrutmydocs.org/ http://www.scrutmydocs.org/
http://dev.david.pilato.fr/ http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Karussell_2 · November 27, 2012, 3:43pm

I will fork it

please !

My only concern with the river is that nodes could be incompatible from a
cluster to another one.

hmmh, indeed a valid concern. but how would you add Jest to the instance
which hosts the plugin?

Jest uses elasticsearch under the hood (why?)! See this discussion:
http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html

Regards,
Peter.

--

dadoonet · November 27, 2012, 3:54pm

Oh. Thanks I was not aware of it.

So I assume that I have to use my own pure REST implementation (with SPORE
specification [1]) - but scan & scroll is not written yet.
So I have to wait for... What ? For myself ? WTF

This way I won't be elasticsearch jar dependent.

[1] GitHub - dadoonet/spore-elasticsearch: SPORE specifications for elasticsearch

Le 27 novembre 2012 à 16:43, Karussell tableyourtime@gmail.com a écrit :

I will fork it

please !

My only concern with the river is that nodes could be incompatible from a
cluster to another one.

hmmh, indeed a valid concern. but how would you add Jest to the instance
which hosts the plugin?

Jest uses elasticsearch under the hood (why?)! See this discussion:
http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html
http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html

Regards,
Peter.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Karussell_2 · November 27, 2012, 4:45pm

is there a Java implementation for SPORE?

Also the GET request(s) for scroll should be very simple to be 'hacked'
together via a simple JSONObject + Apache client ...

but do you know if it is easy to add those dependencies when writing a
plugin? Or is it some maven magic where I use the the full
"dependencies-jar"?

Regards,
Peter.

On Tuesday, November 27, 2012 4:54:40 PM UTC+1, David Pilato wrote:

Oh. Thanks I was not aware of it.

So I assume that I have to use my own pure REST implementation (with
SPORE specification [1]) - but scan & scroll is not written yet.
So I have to wait for... What ? For myself ? WTF

This way I won't be elasticsearch jar dependent.

[1] GitHub - dadoonet/spore-elasticsearch: SPORE specifications for elasticsearch

Le 27 novembre 2012 à 16:43, Karussell <tabley...@gmail.com <javascript:>>
a écrit :

I will fork it

please !

My only concern with the river is that nodes could be incompatible from
a cluster to another one.

hmmh, indeed a valid concern. but how would you add Jest to the instance
which hosts the plugin?

Jest uses elasticsearch under the hood (why?)! See this discussion:
http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html

Regards,
Peter.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

dadoonet · November 27, 2012, 5:13pm

Yes. Not released yet but it will be: GitHub - nicoo/jspore: java implementation of SPORE
It's the one I use for my JUnit tests.
https://github.com/dadoonet/spore-elasticsearch/blob/master/pom.xml#L9

For the RSS River or other rivers I wrote, it was quite easy to add dependencies
in the plugin ZIP file.
Is it your question?

See: https://github.com/dadoonet/rssriver/blob/master/pom.xml#L140
and
https://github.com/dadoonet/rssriver/blob/master/src/main/assemblies/esplugin.xml

David.

Le 27 novembre 2012 à 17:45, Karussell tableyourtime@gmail.com a écrit :

is there a Java implementation for SPORE?

Also the GET request(s) for scroll should be very simple to be 'hacked'
together via a simple JSONObject + Apache client ...

but do you know if it is easy to add those dependencies when writing a
plugin? Or is it some maven magic where I use the the full "dependencies-jar"?

Regards,
Peter.

On Tuesday, November 27, 2012 4:54:40 PM UTC+1, David Pilato wrote:
Oh. Thanks I was not aware of it.

So I assume that I have to use my own pure REST implementation (with
SPORE specification [1]) - but scan & scroll is not written yet.
So I have to wait for... What ? For myself ? WTF

This way I won't be elasticsearch jar dependent.

[1] GitHub - dadoonet/spore-elasticsearch: SPORE specifications for elasticsearch
https://github.com/dadoonet/spore-elasticsearch
https://github.com/dadoonet/spore-elasticsearch

Le 27 novembre 2012 à 16:43, Karussell <
https://github.com/dadoonet/spore-elasticsearch tabley...@gmail.com> a
écrit :
> > >     > I will fork it ;-)
please :) !

> My only concern with the river is that nodes could be incompatible
> from a cluster to another one.

hmmh, indeed a valid concern. but how would you add Jest to the
instance which hosts the plugin?
Jest uses elasticsearch under the hood (why?)! See this discussion:
http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html
http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html
Regards,
Peter.



--



 <http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html>
http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html
--
David Pilato

<http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html>
http://www.scrutmydocs.org/ <http://www.scrutmydocs.org/>
http://dev.david.pilato.fr/ <http://dev.david.pilato.fr/>
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Karussell_2 · November 28, 2012, 11:03am

Hi David,

I've implemented the external cluster thing (for simplicity just with
JSONObject and HttpClient, not sure if it is ok regarding performance/IO).
So if you specify searchHost then this more expensive variation will be
used.

The cool thing is that I can now grab data from production servers into my
local box (when making the port public for this short time). I also
introduced a waitInSeconds parameter to avoid high load. Warning: the call
is not yet async and stopable etc (except you shutdown the server) ...
probably I should move to the river stuff ... or I'll leave this task for
the reader

Regards,
Peter.

On Tuesday, November 27, 2012 6:13:15 PM UTC+1, David Pilato wrote:

Yes. Not released yet but it will be: GitHub - nicoo/jspore: java implementation of SPORE
It's the one I use for my JUnit tests.
https://github.com/dadoonet/spore-elasticsearch/blob/master/pom.xml#L9

For the RSS River or other rivers I wrote, it was quite easy to add
dependencies in the plugin ZIP file.
Is it your question?

See: https://github.com/dadoonet/rssriver/blob/master/pom.xml#L140
and

https://github.com/dadoonet/rssriver/blob/master/src/main/assemblies/esplugin.xml

David.

Le 27 novembre 2012 à 17:45, Karussell <tabley...@gmail.com <javascript:>>
a écrit :

is there a Java implementation for SPORE?

Also the GET request(s) for scroll should be very simple to be 'hacked'
together via a simple JSONObject + Apache client ...

but do you know if it is easy to add those dependencies when writing a
plugin? Or is it some maven magic where I use the the full
"dependencies-jar"?

Regards,
Peter.

On Tuesday, November 27, 2012 4:54:40 PM UTC+1, David Pilato wrote:

Oh. Thanks I was not aware of it.

So I assume that I have to use my own pure REST implementation (with
SPORE specification [1]) - but scan & scroll is not written yet.
So I have to wait for... What ? For myself ? WTF

This way I won't be elasticsearch jar dependent.

[1] GitHub - dadoonet/spore-elasticsearch: SPORE specifications for elasticsearch
https://github.com/dadoonet/spore-elasticsearch

Le 27 novembre 2012 à 16:43, Karussell <https://github.com/dadoonet/spore-elasticsearch
tabley...@gmail.com> a écrit :

I will fork it

please !

My only concern with the river is that nodes could be incompatible from
a cluster to another one.

hmmh, indeed a valid concern. but how would you add Jest to the instance
which hosts the plugin?

Jest uses elasticsearch under the hood (why?)! See this discussion:
http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html

Regards,
Peter.

--

http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html

http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html

--
David Pilato

http://elasticsearch-users.115913.n3.nabble.com/ANN-Jest-ElasticSearch-Java-Rest-Client-td4023119.html
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

ferhatsb · November 28, 2012, 11:06am

Hi,

@peter We were about to discuss to implement something similar, thanks for
it I will start to play with it as well.

@both Jest using ES for query builder, just opened an issue to make it
optional.

scroll api is not yet availble. Please open issues for missing parts, we
can prioritize according to requirements.

Best,
Ferhat
www.searchbox.io

On Tuesday, November 27, 2012 4:35:07 PM UTC+2, Karussell wrote:

Hi,

this is a plugin which wraps some 'reindex' functionality and executes
this on the server-side. This could be useful

if you want to change some index settings which are not updatable (like
shard count etc => reindexing into a new index)

or if you want to change some type settings (reindexing into the same
index)

or if you want to copy/update only specific data into another index =>
therefor you can specify a query (default is match_all)

GitHub - karussell/elasticsearch-reindex: Simple re-indexing. To backup, apply index settings changes and more ElasticMagic

Let me know if you have problems or suggestions!

Regards,
Peter.

--

Topic		Replies	Views
How to reindex an ES index Elasticsearch	11	1120	July 6, 2017
Reindexing with new mapping Elasticsearch	14	3525	July 6, 2017
Can the elasticsearch-reindex plugin preserve internal search indices? Elasticsearch	1	333	July 6, 2017
Reindexing Elasticsearch	3	272	July 6, 2017
ElasticSearch partial re-index with Logstash Logstash	9	1478	July 6, 2017

[ANN] Elasticsearch reindex plugin

Related topics