While Enterprise Search uses Elasticsearch under the hood, you can't just import indices created via other approaches like that. You will need to ingest that data via the Enterprise Search APIs.
OK. During and following the reindexing process, is there a location which may be backed-up to permit an efficient transfer of data location or reconstruction in case of data corruption?
If snapshots are primarily insurance when performing unfamiliar actions . . . yet it can take days, weeks or even months for fscrawler to index large corpora . . . . then, if data is lost or corrupted, does Elastic.co suggest or offer no solution other than re-indexing?
Scrilling wrote " I recommend checking out the Elasticsearch Snapshot/Restore APIs to back up data WHEN PERFORMING UNFAMILIAR ACTIONS."
OMG . . I just spotted this at the bottom of the Snapshot/Restore page recommended by Scrilling and Warkolm.
"WARNING - The only reliable and supported way to back up a cluster is by taking a snapshot . You cannot back up an Elasticsearch cluster by making copies of the data directories of its nodes. There are no supported methods to restore any data from a filesystem-level backup. If you try to restore a cluster from such a backup, it may fail with reports of corruption or missing files or other data inconsistencies, or it may appear to Have succeeded having silently lost some of your data."
Would snapshots be practicable with a multi-terrabite index?
How long would it take to backup a multi-terrabite index?
How much data would potentially be lost before the next backup could start?
One could use many, concurrent, small indices, to increase the speed of a backup, but this wouldn't solve the total required backup capacity!
Or, would staggard, concurrent backups be the solution?
What prompted the warning to be on the page?
A new index is required for every Lucene upgrade.
As this appears to be a Lucene issue, does it also affects Solr?
It's actually not so bad when one is aware of the issue.
Users with small indices can upgrade fairly painlessly, possibly using parallel systems which can be switched.
Users with large indices need to be more circumspect and upgrade only when the benefits of the upgrade outweigh the cost and disruption..
From your business perspective, it may be worth putting this information - together with the considerations and solutions - front and centre, rather than as frightening warnings on a peripheral page.
I will now stop being mean to Mr. Pilato and I'll stick to using English.
FYI. It took eleven weeks (24/7) to generate my 5TB indices. They are now useless. Time to start again.
I'm unsure what are you expecting from me. BTW if you want to speak in french with me, you can do it in Discussions en français.
Or if you just want to clarify privately, you can DM me.
So M. Pilato's answer would be, non.
I don't get it.
I will now stop being mean to Mr. Pilato and I'll stick to using English.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.