I've got a problem, an urgent one.
I'm trying to bulk insert 500.000 data into elastic search, but it keeps loading forever and then crashes, he doesn't like it.
My mapping contains 20 elements, my query contains 500k data, then I loop over all the data to put them in the document :
Send a reasonable number of documents, e.g. 1000, per bulk request, not all of them. Try to keep the size of each bulk request at or below 5MB in size.
How to do just send 1k requests each? As I am looping over my query, how can I loop over it's 1k results, and then loop once again against it's next 1k results? Is my question clear?
But even if I loop over my SQL query to add only the first 1k results, I'll have to put all of that in a loop to go over the next 1k results, and the next .. etc, wouldn't that cause the exact same problem? Knowing that they're all in the same loop anyway, so won't ES crash again cuz it's adding 1k one after another in the same loop?
And maybe you're right that it isn't related directly to ES but thanks for trying to help nonetheless !!
You can still loop over all the results from your SQL query, but send a request for every 1000 documents. These smaller requests to Elasticsearch will be more efficient and likely faster.
Still crashes even after this, here is my algorithm ..
// while we didn't loop through every data
while(condition) {
$query = tep_db_query("
// get first/next 1000
");
// put data inside first 1000
while($array_collections = tep_db_fetch_array($query))
$achatsDocs[] = new \Elastica\Document('', \Glam\HttpUtils::jsonEncode(
array(
// 20 documents
)
));
}
$achatsReportType->addDocuments($achatsDocs);
$achatsReportType->getIndex()->refresh();
// go over next 1000
$limit_start = $limit_start + 1000;
$limit_end = $limit_end + 1000;
}
This does end up adding 70k results before crashing with this error : Fatal error: Uncaught exception 'Elastica\Exception\Connection\HttpException' with message 'Unknown error:52' in /var/www/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php:167 Stack trace: #0 /var/www/vendor/ruflin/elastica/lib/Elastica/Request.php(171): Elastica\Transport\Http->exec(Object(Elastica\Request), Array) #1 /var/www/vendor/ruflin/elastica/lib/Elastica/Client.php(621): Elastica\Request->send() #2 /var/www/vendor/ruflin/elastica/lib/Elastica/Bulk.php(360): Elastica\Client->request('_bulk', 'PUT', '{"index":{"_ind...', Array) #3 /var/www/vendor/ruflin/elastica/lib/Elastica/Client.php(314): Elastica\Bulk->send() #4 /var/www/vendor/ruflin/elastica/lib/Elastica/Index.php(150): Elastica\Client->addDocuments(Array) #5 /var/www/vendor/ruflin/elastica/lib/Elastica/Type.php(196): Elastica\Index->addDocuments(Array) #6 /var/www/htdocs/adm54140/achatsReport_map.php(280): Elastica\Type->addDocuments(Array) #7 {main} thrown in /var/www/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php on line 167 array(4) { ["code"]=> string(7) "E_ERROR" ["message"]=> string(928) "Uncaught exception 'Elastica\Exception\Connection\HttpException' with message 'Unknown error:52' in /var/www/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php:167 Stack trace: #0 /var/www/vendor/ruflin/elastica/lib/Elastica/Request.php(171): Elastica\Transport\Http->exec(Object(Elastica\Request), Array) #1 /var/www/vendor/ruflin/elastica/lib/Elastica/Client.php(621): Elastica\Request->send() #2 /var/www/vendor/ruflin/elastica/lib/Elastica/Bulk.php(360): Elastica\Client->request('_bulk', 'PUT', '{"index":{"_ind...', Array) #3 /var/www/vendor/ruflin/elastica/lib/Elastica/Client.php(314): Elastica\Bulk->send() #4 /var/www/vendor/ruflin/elastica/lib/Elastica/Index.php(150): Elastica\Client->addDocuments(Array) #5 /var/www/vendor/ruflin/elastica/lib/Elastica/Type.php(196): Elastica\Index->addDocuments(Array) #6 /var/www/htdocs/adm54140/achatsReport_map.php(280): Elastica\Type->addDocuments(Array) #7 {main} thrown" ["file"]=> string(63) "/var/www/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php" ["line"]=> int(167) }
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.