I am trying to index an initial workload of ~1 billion documents into ES 5.2. The documents are pulled from an MYSQL database and they already contain a unique ID and I want to use the same IDs in ES for a quick reference.
I was reading about how to tune indexing and I have tried disabling index refresh and replicas and it sped up the process so now I am wondering if there is a way to disable ID lookup as it might speed up things more.
Its not slow per say, however as I move forward with the actual process I will have the shutdown the system to prevent new data from being inserted in the db untill the process finish so I would like to minimize the downtime as much as possible.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.