Index content on one machine, send index to other machine?

stephan_nordnes_erik · July 22, 2015, 11:58pm

Hi, I am pretty new to elasticsearch, so I'm sorry if this is a trivial question. I am trying to figure out if it is possible to have a setup like this:

Machine A has a large corpus of documents. Machine B is running elasticsearch.

Machine A scans it's corpus of documents and runs the elasticsearch full-text indexing procedure on that content. Machine A sends the compact indexed version of the documents to Machine B. Machine B inserts the index provided by Machine A into it's elasticsearch instance. Machine B is now able to search for documents on Machine A and return IDs representing files from Machine A. Machine B has never seen the full content of Machine A's files.

By doing it this way I do not need to move the data around as much and I would probably save a lot of bandwidth and time.

Is this at all possible, or does Elasticsearch require the entire content to be on the same cluster/machine as is running the elasticsearch instance? I have not been able to find any information about this anywhere, possibly due to my lack of understanding.

If anyone could enlighten me on this subject that would be great!

warkolm · July 23, 2015, 12:19am

This is not how ES works, you need to send the complete data to it to be able to search.

stephan_nordnes_erik · July 23, 2015, 12:41am

Thanks for the quick reply! Is it possible to elaborate a bit?

Is this not possible because there are no APIs for it? Or is there some inherent attributes with the way the indexing is performed that requires all the content to be at the same location as where the search will be performed.

I have found that you can set "store" to "no" and "index" to "yes", so the content actually does not stick around as far as I understand (related stackoverflow: http://stackoverflow.com/questions/17103047/why-do-i-need-storeyes-in-elasticsearch). This leads me to think that the approach I am describing possibly could be done.

But again, I am an elasticsearch noob. It is quite likely that this isn't possible to do, but I would really like to understand why. If is is as simple as "there are currently no api for it" I could maybe look into making something myself.

warkolm · July 23, 2015, 12:43am

You still need to ship all the data over to ES for it to be indexed, irrespective of it being stored or not.

The only way to index the data on machine A would be to install ES on it. If you then wanted to move that data to machine B you can. That may say you bandwidth, but it seems like a lot of work for little return.

stephan_nordnes_erik · July 23, 2015, 12:53am

So, it would be possible to install ES on machine A, have that index a document, and then transfer the index of just that document to machine B? Or are you saying that this approach would have to transfer the whole ES database from machine A to machine B?

warkolm · July 23, 2015, 6:30am

ES isn't a database. However conceptually you are right.

If you install ES on A and then index that data, you need to copy that data to B if that is where you want to query it.

Topic		Replies	Views
Index server on remote machine Elasticsearch	3	481	July 6, 2017
Move a one node index to a different machine Elasticsearch	3	1899	July 6, 2017
Copying elastic indexes Elasticsearch	7	745	August 14, 2018
How to move ES data to another machine? Elasticsearch	5	3539	May 3, 2017
Best way to offload indexing from reading node Elasticsearch	5	1222	July 6, 2017

Index content on one machine, send index to other machine?

Related topics