Advice on a Master/Slave setup


(anirban2004) #1

I have been using Elasticsearch and it works great for what I have needed so far including searching across multiple indexes, custom scoring etc.

Now I am faced with a situation where the need is to make things work within the constraints of a Master/Slave setup with the following characteristics :-

  1. indexing needs to be done on a master
  2. incremental changes (read Lucene index segment files) need to be pushed regularly to a number of slaves using a certain transport mechanism that can distribute files from a master to many slaves
  3. slaves need to start responding to queries with the new changes

This is probably somewhat similar to the Solr Master/Slave mechanism (with old-style replication aka snapshooter/snappuller/snapinstaller) and possibly not in line with the advanced replication that Elasticsearch is designed for.

However, since ES offers so many goodies, would really love to be able to use it within this setup as well.

Is there any chance something like the following could work ?

  • index on a master node
  • run ES on slaves in embedded mode i.e. within same JVM as web application
  • sync the data folder of the master with the remote slaves after every run of indexing (add new segments, remove old ones etc.)
  • somehow make a slave reopen its Reader(s) [maybe via a refresh request] and be able to serve the new data

I have tried something along the above lines but the slaves do not seem to serve the new stuff . Sometimes flush warnings appear in the logs ("failed to read latest segment infos on flush") and no data at all is returned for queries after that.

Would really appreciate your insights and/or any helpful advice on this. Is this path at all worth pursuing ?

Anirban


(system) #2