Indexing can be CPU intensive and even more so if you are using ingest node. Given the size of your documents and the speed at which you want to index this, the cluster sounds quite small. How many CPU cores do you have? What type of storage?
What level of throughput are you seeing with the current setup? What is limiting performance?
Then run tests and try to identify what system resource that is limiting performance, e.g. CPU and/or disk I/O. I generally index a lot smaller documents, so am not sure how to best tune for your particular use-case.
If I am calculating correctly, that is about 1.33TB of raw data. If that is the case you will most likely need a lot larger cluster to be able to ingest that in 15 minutes...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.