[ANN] interruption-free reindex

Hi all,

I've implemented an interruption-freehttps://github.com/lgueye/elasticsearch-continuity reindex
system (https://github.com/lgueye/elasticsearch-continuity).
It really works but it's not elasticsearch-ready.
It involves aliases, configuration conventions and a factory with a
drop-create indices strategy.
Extracting the relevant parts to create an packaged solution is no issue to
me.
I strongly encourage anyone to point out deficiencies and suggestions would
be appreciated.

Cheers,

Louis

--

What does it mean 'interruption free reindexing' (you should add some more
docu :))? Is it a combination of
GitHub - karussell/elasticsearch-rollindex: ElasticSearch plugin for rolling indices. and
GitHub - karussell/elasticsearch-reindex: Simple re-indexing. To backup, apply index settings changes and more ElasticMagic?

(And why do you need spring, OMG ;)?)

Regards,
Peter.

On Wednesday, November 28, 2012 3:42:53 PM UTC+1, louis.gueye wrote:

Hi all,

I've implemented an interruption-freehttps://github.com/lgueye/elasticsearch-continuity reindex
system (GitHub - lgueye/elasticsearch-continuity: Project that validates elasticsearch continuity of service at re-index).
It really works but it's not elasticsearch-ready.
It involves aliases, configuration conventions and a factory with a
drop-create indices strategy.
Extracting the relevant parts to create an packaged solution is no issue
to me.
I strongly encourage anyone to point out deficiencies and suggestions
would be appreciated.

Cheers,

Louis

--

Hi,

This seems to be the key part?

When the system stops consuming messages
And I create a valid classified:
| title | description |
| whatever title | whatever description |
When I search for classifieds which "title" matches "whatever"
Then I should get no results

Can you elaborate a bit please? How is this unusual?

Thanks,
Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Wednesday, November 28, 2012 9:42:53 AM UTC-5, louis.gueye wrote:

Hi all,

I've implemented an interruption-freehttps://github.com/lgueye/elasticsearch-continuity reindex
system (GitHub - lgueye/elasticsearch-continuity: Project that validates elasticsearch continuity of service at re-index).
It really works but it's not elasticsearch-ready.
It involves aliases, configuration conventions and a factory with a
drop-create indices strategy.
Extracting the relevant parts to create an packaged solution is no issue
to me.
I strongly encourage anyone to point out deficiencies and suggestions
would be appreciated.

Cheers,

Louis

--

Hi peter,

I was not aware of these plugins, my mistake.
The solution I propose seems to be a combination of roll an reindex.

My purpose is to be able to reindex and drop the useless index without
breaking an application.
An application can always read to an alias but has to know the index
physical name when it comes to write (index, drop, etc.
So at some point I need a solution to rool/switch index without breaking
the app and without loosing the incoming write orders.

This is how I saw it:

  • stop consuming write orders (they enqueue in a queue)
  • create a new index
  • feed the new index from the database repository (it could be from an
    index store with a matchall query)
  • add new index to alias
  • remove old index from alias
  • drop old index
  • resume consuming orders from queue (which resumes the "normal" write
    cycle)

It may be complex to some people. I'm really interested in any improvement
on that worlflow as am not aware of all ES capabilities.

PS: why not spring? :slight_smile:

--
Cordialement/Regards,

Louis GUEYE
linkedin http://fr.linkedin.com/in/louisgueye |
bloghttp://deepintojee.wordpress.com/|
twitter http://twitter.com/#!/lgueye

2012/12/1 Karussell tableyourtime@gmail.com

What does it mean 'interruption free reindexing' (you should add some more
docu :))? Is it a combination of
GitHub - karussell/elasticsearch-rollindex: ElasticSearch plugin for rolling indices. and
GitHub - karussell/elasticsearch-reindex: Simple re-indexing. To backup, apply index settings changes and more ElasticMagic?

(And why do you need spring, OMG ;)?)

Regards,
Peter.

On Wednesday, November 28, 2012 3:42:53 PM UTC+1, louis.gueye wrote:

Hi all,

I've implemented an interruption-freehttps://github.com/lgueye/elasticsearch-continuity reindex
system (https://github.com/lgueye/**elasticsearch-continuityhttps://github.com/lgueye/elasticsearch-continuity
).
It really works but it's not elasticsearch-ready.
It involves aliases, configuration conventions and a factory with a
drop-create indices strategy.
Extracting the relevant parts to create an packaged solution is no issue
to me.
I strongly encourage anyone to point out deficiencies and suggestions
would be appreciated.

Cheers,

Louis

--

--

Hi Otis,

Stopping message cunsumption is not unusual at all.
I just don't know how to do it natively with elasticsearch. A river maybe ?

Cordialement/Regards,

Louis GUEYE
linkedin http://fr.linkedin.com/in/louisgueye |
bloghttp://deepintojee.wordpress.com/|
twitter http://twitter.com/#!/lgueye

2012/12/1 Otis Gospodnetic otis.gospodnetic@gmail.com

Hi,

This seems to be the key part?

When the system stops consuming messages
And I create a valid classified:
| title | description |
| whatever title | whatever description |
When I search for classifieds which "title" matches "whatever"
Then I should get no results

Can you elaborate a bit please? How is this unusual?

Thanks,
Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Wednesday, November 28, 2012 9:42:53 AM UTC-5, louis.gueye wrote:

Hi all,

I've implemented an interruption-freehttps://github.com/lgueye/elasticsearch-continuity reindex
system (https://github.com/lgueye/**elasticsearch-continuityhttps://github.com/lgueye/elasticsearch-continuity
).
It really works but it's not elasticsearch-ready.
It involves aliases, configuration conventions and a factory with a
drop-create indices strategy.
Extracting the relevant parts to create an packaged solution is no issue
to me.
I strongly encourage anyone to point out deficiencies and suggestions
would be appreciated.

Cheers,

Louis

--

--
Cordialement/Regards,

Louis GUEYE
linkedin http://fr.linkedin.com/in/louisgueye |
bloghttp://deepintojee.wordpress.com/|
twitter http://twitter.com/#!/lgueye

--

Stopping message cunsumption is not unusual at all.

Still not sure I understand :slight_smile:
You mean in your case Elasticsearch 'listens' to messages? then yes, a
river is for that but pushing to Elasticsearch is correct as well.

An application can always read to an alias but has to know the index
physical name when it comes to write (index, drop, etc.

You can just attach exactly one feeding alias (like I'm doing in the
reindex plugin) to the index and feed to it.
No need to interrupt anything.

PS: why not spring? :slight_smile:

just saying that I don't like it, need it + see sense in it :slight_smile: ... thatswhy
I was asking for which tasks do you need it (?)

Regards,
Peter.

On Sunday, December 2, 2012 11:56:13 PM UTC+1, louis.gueye wrote:

Hi peter,

I was not aware of these plugins, my mistake.
The solution I propose seems to be a combination of roll an reindex.

My purpose is to be able to reindex and drop the useless index without
breaking an application.
An application can always read to an alias but has to know the index
physical name when it comes to write (index, drop, etc.
So at some point I need a solution to rool/switch index without breaking
the app and without loosing the incoming write orders.

This is how I saw it:

  • stop consuming write orders (they enqueue in a queue)
  • create a new index
  • feed the new index from the database repository (it could be from an
    index store with a matchall query)
  • add new index to alias
  • remove old index from alias
  • drop old index
  • resume consuming orders from queue (which resumes the "normal" write
    cycle)

It may be complex to some people. I'm really interested in any improvement
on that worlflow as am not aware of all ES capabilities.

PS: why not spring? :slight_smile:

--
Cordialement/Regards,

Louis GUEYE
linkedin http://fr.linkedin.com/in/louisgueye | bloghttp://deepintojee.wordpress.com/|
twitter http://twitter.com/#!/lgueye

2012/12/1 Karussell <tabley...@gmail.com <javascript:>>

What does it mean 'interruption free reindexing' (you should add some
more docu :))? Is it a combination of
GitHub - karussell/elasticsearch-rollindex: ElasticSearch plugin for rolling indices. and
GitHub - karussell/elasticsearch-reindex: Simple re-indexing. To backup, apply index settings changes and more ElasticMagic?

(And why do you need spring, OMG ;)?)

Regards,
Peter.

On Wednesday, November 28, 2012 3:42:53 PM UTC+1, louis.gueye wrote:

Hi all,

I've implemented an interruption-freehttps://github.com/lgueye/elasticsearch-continuity reindex
system (https://github.com/lgueye/**elasticsearch-continuityhttps://github.com/lgueye/elasticsearch-continuity
).
It really works but it's not elasticsearch-ready.
It involves aliases, configuration conventions and a factory with a
drop-create indices strategy.
Extracting the relevant parts to create an packaged solution is no issue
to me.
I strongly encourage anyone to point out deficiencies and suggestions
would be appreciated.

Cheers,

Louis

--

--

Hi Peter,

I now see that the full re-index can be avoided with a river if the
real-time availability is not an issue.
I use Spring to instanciate the Elasticsearch client based on
cluster/indices/mapping settings. It encapsulates nicely the client
creation and makes it available to the whole application.

Anyway, thank you for the insight.

--
Cordialement/Regards,

Louis GUEYE
linkedin http://fr.linkedin.com/in/louisgueye |
bloghttp://deepintojee.wordpress.com/
| twitter http://twitter.com/#!/lgueye

2012/12/4 Karussell tableyourtime@gmail.com

ot sure I understand :slight_smile:
You mean in your case Elasticsearch 'listens' to messages

--

Hi Louis,

I now see that the full re-index can be avoided with a river if the
real-time availability is not an issue.

moving the (feeding/search) alias from one index to another is 'real-time'
(did you mean that?) as it is an atomic operation.

I use Spring to instanciate the Elasticsearch client based on
cluster/indices/mapping settings. It encapsulates nicely the client
creation

... that is not hard without spring ...

and makes it available to the whole application.

... for that I prefer good 'young' guice :wink: ...
but seriously: I just think those dependencies will bloat the normally
small plugins (with no benefit in this case IMO).

Peter.

--