I have a requirment where for some data setting the '_source' in
indexrequest is strightforward but for some huge amount of data I need to
run long running activity to generate data which needs to be indexed. So
frequently I plan to take a snapshot from ES to Hadoop and want to add new
lucene documents (using lucene 4.7x library not elasticsearch library) in
hadoop by running a batch job and finally restore this modified index
repo/snapshot to ES.
Is it possible to update snapshot data and restore? If so, how to get
handle of Lucene (org.apache.lucene.store.Directory) which stored in hdfs
and addDocument using indexwriter
Is there any other better alternative to achieve the above requirement?
No I don't believe so. The snapshot data is not really a "valid" Lucene
index, per se. It does contain segment files, but they are named and
packaged in a specific manner that it would be best not to mess with them.
Could please suggest a best option for merging index data stored in HDFS
with the index data stored in ES node
On Saturday, May 31, 2014 1:45:13 AM UTC+5:30, Binh Ly wrote:
No I don't believe so. The snapshot data is not really a "valid" Lucene
index, per se. It does contain segment files, but they are named and
packaged in a specific manner that it would be best not to mess with them.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.