I've implemented an interruption-freehttps://github.com/lgueye/elasticsearch-continuity reindex
system (https://github.com/lgueye/elasticsearch-continuity).
It really works but it's not elasticsearch-ready.
It involves aliases, configuration conventions and a factory with a
drop-create indices strategy.
Extracting the relevant parts to create an packaged solution is no issue to
me.
I strongly encourage anyone to point out deficiencies and suggestions would
be appreciated.
When the system stops consuming messages
And I create a valid classified:
| title | description |
| whatever title | whatever description |
When I search for classifieds which "title" matches "whatever"
Then I should get no results
Can you elaborate a bit please? How is this unusual?
I was not aware of these plugins, my mistake.
The solution I propose seems to be a combination of roll an reindex.
My purpose is to be able to reindex and drop the useless index without
breaking an application.
An application can always read to an alias but has to know the index
physical name when it comes to write (index, drop, etc.
So at some point I need a solution to rool/switch index without breaking
the app and without loosing the incoming write orders.
This is how I saw it:
stop consuming write orders (they enqueue in a queue)
create a new index
feed the new index from the database repository (it could be from an
index store with a matchall query)
add new index to alias
remove old index from alias
drop old index
resume consuming orders from queue (which resumes the "normal" write
cycle)
It may be complex to some people. I'm really interested in any improvement
on that worlflow as am not aware of all ES capabilities.
When the system stops consuming messages
And I create a valid classified:
| title | description |
| whatever title | whatever description |
When I search for classifieds which "title" matches "whatever"
Then I should get no results
Can you elaborate a bit please? How is this unusual?
Stopping message cunsumption is not unusual at all.
Still not sure I understand
You mean in your case Elasticsearch 'listens' to messages? then yes, a
river is for that but pushing to Elasticsearch is correct as well.
An application can always read to an alias but has to know the index
physical name when it comes to write (index, drop, etc.
You can just attach exactly one feeding alias (like I'm doing in the
reindex plugin) to the index and feed to it.
No need to interrupt anything.
PS: why not spring?
just saying that I don't like it, need it + see sense in it ... thatswhy
I was asking for which tasks do you need it (?)
Regards,
Peter.
On Sunday, December 2, 2012 11:56:13 PM UTC+1, louis.gueye wrote:
Hi peter,
I was not aware of these plugins, my mistake.
The solution I propose seems to be a combination of roll an reindex.
My purpose is to be able to reindex and drop the useless index without
breaking an application.
An application can always read to an alias but has to know the index
physical name when it comes to write (index, drop, etc.
So at some point I need a solution to rool/switch index without breaking
the app and without loosing the incoming write orders.
This is how I saw it:
stop consuming write orders (they enqueue in a queue)
create a new index
feed the new index from the database repository (it could be from an
index store with a matchall query)
add new index to alias
remove old index from alias
drop old index
resume consuming orders from queue (which resumes the "normal" write
cycle)
It may be complex to some people. I'm really interested in any improvement
on that worlflow as am not aware of all ES capabilities.
I now see that the full re-index can be avoided with a river if the
real-time availability is not an issue.
I use Spring to instanciate the Elasticsearch client based on
cluster/indices/mapping settings. It encapsulates nicely the client
creation and makes it available to the whole application.
I now see that the full re-index can be avoided with a river if the
real-time availability is not an issue.
moving the (feeding/search) alias from one index to another is 'real-time'
(did you mean that?) as it is an atomic operation.
I use Spring to instanciate the Elasticsearch client based on
cluster/indices/mapping settings. It encapsulates nicely the client
creation
... that is not hard without spring ...
and makes it available to the whole application.
... for that I prefer good 'young' guice ...
but seriously: I just think those dependencies will bloat the normally
small plugins (with no benefit in this case IMO).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.