We are planning to bulk insert about 10 Gig data ,however we are being
forced to do this from a remote host.
Is this a good practice? And are there any potential issues i should watch
out for?
We are planning to bulk insert about 10 Gig data ,however we are being forced to do this from a remote host.
Is this a good practice? And are there any potential issues i should watch out for?
The best way to approach this is to restrict the size of your bulk request
and / or the number of documents for each request.
I tend to do both, the best sizes seem to be in the 5 to 10 MiB range,
however, I also restrict (which isn't really necessary) the max number of
documents (e.g. 5000) for one request.
You should check out this link from the documentation:
Chris
On Tuesday, 21 April 2015 00:40:43 UTC+2, TB wrote:
We are planning to bulk insert about 10 Gig data ,however we are being
forced to do this from a remote host.
Is this a good practice? And are there any potential issues i should watch
out for?
David and Christopher, thanks for your advice, i did split the files into
12 MB chunks,which was found to be optimum after testing various sizes.
I wanted draw from your experience of potential issues, w.r.to bulk
indexing from local vs bulk indexing from remote.
i did choose to bulk index locally
On Monday, April 20, 2015 at 5:40:43 PM UTC-5, TB wrote:
We are planning to bulk insert about 10 Gig data ,however we are being
forced to do this from a remote host.
Is this a good practice? And are there any potential issues i should watch
out for?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.