Bulk indexing crashes

Hello all,

I've got a problem, an urgent one.
I'm trying to bulk insert 500.000 data into elastic search, but it keeps loading forever and then crashes, he doesn't like it.

My mapping contains 20 elements, my query contains 500k data, then I loop over all the data to put them in the document :

$achatsDocs[] = new \Elastica\Document('', \Glam\HttpUtils::jsonEncode(
		array(
			'x' => 'y',
                         .... * 20
		)
	));
}

$achatsReportType->addDocuments($achatsDocs);
$achatsReportType->getIndex()->refresh();

I can't seem to figure out why doesn't it work. I'm very new to Elastic-search. Is my configuration incorrect?

Send a reasonable number of documents, e.g. 1000, per bulk request, not all of them. Try to keep the size of each bulk request at or below 5MB in size.

How to do just send 1k requests each? As I am looping over my query, how can I loop over it's 1k results, and then loop once again against it's next 1k results? Is my question clear? :frowning:

$query = tep_db_query(" 
// Query giving 500k results
");

$achatsDocs = array();

while($array_collections = tep_db_fetch_array($query)){
	//looping the query


    $achatsDocs[] = new \Elastica\Document('', \Glam\HttpUtils::jsonEncode(
		array(
			// documents
		)
	));
}

$achatsReportType->addDocuments($achatsDocs);
$achatsReportType->getIndex()->refresh();

That is probably more of a generic programming question, and not directly related to Elasticsearch.

Keep track of how many documents you have added in the loop and then periodically send a request to Elasticsearch before starting on a new batch.

1 Like

But even if I loop over my SQL query to add only the first 1k results, I'll have to put all of that in a loop to go over the next 1k results, and the next .. etc, wouldn't that cause the exact same problem? Knowing that they're all in the same loop anyway, so won't ES crash again cuz it's adding 1k one after another in the same loop?

And maybe you're right that it isn't related directly to ES but thanks for trying to help nonetheless !! :slight_smile:

You can still loop over all the results from your SQL query, but send a request for every 1000 documents. These smaller requests to Elasticsearch will be more efficient and likely faster.

1 Like

Still crashes even after this, here is my algorithm ..

// while we didn't loop through every data
while(condition) {

	$query = tep_db_query("
	    // get first/next 1000
	");
   
    // put data inside first 1000
	while($array_collections = tep_db_fetch_array($query))

		$achatsDocs[] = new \Elastica\Document('', \Glam\HttpUtils::jsonEncode(
			array(
				// 20 documents
			)
		));
	}

	$achatsReportType->addDocuments($achatsDocs);
	$achatsReportType->getIndex()->refresh();

    // go over next 1000
	$limit_start = $limit_start + 1000;
	$limit_end = $limit_end + 1000;
	

}

This does end up adding 70k results before crashing with this error :
Fatal error: Uncaught exception 'Elastica\Exception\Connection\HttpException' with message 'Unknown error:52' in /var/www/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php:167 Stack trace: #0 /var/www/vendor/ruflin/elastica/lib/Elastica/Request.php(171): Elastica\Transport\Http->exec(Object(Elastica\Request), Array) #1 /var/www/vendor/ruflin/elastica/lib/Elastica/Client.php(621): Elastica\Request->send() #2 /var/www/vendor/ruflin/elastica/lib/Elastica/Bulk.php(360): Elastica\Client->request('_bulk', 'PUT', '{"index":{"_ind...', Array) #3 /var/www/vendor/ruflin/elastica/lib/Elastica/Client.php(314): Elastica\Bulk->send() #4 /var/www/vendor/ruflin/elastica/lib/Elastica/Index.php(150): Elastica\Client->addDocuments(Array) #5 /var/www/vendor/ruflin/elastica/lib/Elastica/Type.php(196): Elastica\Index->addDocuments(Array) #6 /var/www/htdocs/adm54140/achatsReport_map.php(280): Elastica\Type->addDocuments(Array) #7 {main} thrown in /var/www/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php on line 167 array(4) { ["code"]=> string(7) "E_ERROR" ["message"]=> string(928) "Uncaught exception 'Elastica\Exception\Connection\HttpException' with message 'Unknown error:52' in /var/www/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php:167 Stack trace: #0 /var/www/vendor/ruflin/elastica/lib/Elastica/Request.php(171): Elastica\Transport\Http->exec(Object(Elastica\Request), Array) #1 /var/www/vendor/ruflin/elastica/lib/Elastica/Client.php(621): Elastica\Request->send() #2 /var/www/vendor/ruflin/elastica/lib/Elastica/Bulk.php(360): Elastica\Client->request('_bulk', 'PUT', '{"index":{"_ind...', Array) #3 /var/www/vendor/ruflin/elastica/lib/Elastica/Client.php(314): Elastica\Bulk->send() #4 /var/www/vendor/ruflin/elastica/lib/Elastica/Index.php(150): Elastica\Client->addDocuments(Array) #5 /var/www/vendor/ruflin/elastica/lib/Elastica/Type.php(196): Elastica\Index->addDocuments(Array) #6 /var/www/htdocs/adm54140/achatsReport_map.php(280): Elastica\Type->addDocuments(Array) #7 {main} thrown" ["file"]=> string(63) "/var/www/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php" ["line"]=> int(167) }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.