Elasticsearch 2 scroll in PHP - memory issue

Hi, can somebody help me please to solve the problem with Elasticsearch 2 scroll API? I need to migrate data from Elastic 2 to Mongodb. I have a php script below which throws me an error VirtualAlloc() failed: [0x000005af] The paging file is too small for this operation to complete after few millions items. I did it according this guide for ES2 and this is the script which is quite simple:

indent preformatted text by 4 spaces

<?php

error_reporting( E_ALL );
ini_set( 'memory_limit', -1 );
ini_set( 'max_execution_time', -1 );

/** @var \Nette\DI\Container $container */
$container = require( __DIR__ . '/../app/bootstrap.php' );

/** @var MongoConnect $mongo */
$mongo = $container->getService( 'mongo' );
/** @var \MongoDB\Collection $eventsCollection */
$eventsCollection = $mongo->selectCollection( 'WebApp', 'Events2' );

/** @var Elastica\Client $elastic */
$elastic = new Elastica\Client();
/** @var Elastica\Index $elasticIndex */
$elasticScrollData = $elastic->getIndex( 'event' )->request( '_search?scroll=3s', 'POST', ['size' => 250, 'sort' => '_doc'] )->getData();
$countAll = $elasticScrollData['hits']['total'];

$offset = 0;
saveToMongo( $elasticScrollData, $countAll, $offset, $elastic, $eventsCollection );


function saveToMongo( $scrollData, $countAll, $offset, \Elastica\Client $elastic, \MongoDB\Collection $mongoCollection )
{
    $documents = [];
    foreach ( $scrollData['hits']['hits'] as $item )
    {
	    //if( $item['_type'] == 'kataster' ) continue;  // Throws an error after 1,200000

	    $doc = [];
	    foreach ( $item['_source'] as $key => $val )
	    {
		    if( in_array( $key, ['publishDate', 'generateDate', 'eventDate'] ) )
		    {
			    $date = stringToDate( $val );
			    if( $date instanceof DateTime ) $doc[$key] = new \MongoDB\BSON\UTCDateTime( $date->format('U') * 1000 );
			    else \Tracy\Debugger::log( 'Invalid ' . $key . ' - ' . $date, 'events-import-error' );
		    }
		    else $doc[$key] = $val;
	    }

	    if( isset( $item['_type'] ) ) $doc['type'] = $item['_type'];
	    $doc['oldEsId'] = $item['_id'];

	    $documents[] = $doc;

	    $offset++;
    }

    try
    {
	    $mongoCollection->insertMany( $documents, ['ordered' => FALSE] );
	    echo '--- offest ' . ( $offset ) . ' OK' . "\n";
    }
    catch( \Exception $e )
    {
	    echo '+++ insert exception: ' . $e->getMessage() . "\n";
    }


    if( $offset < $countAll )
    {
	    $scrollData = $elastic->request( '_search/scroll', 'POST', ['scroll' => '3s', 'scroll_id' => $scrollData['_scroll_id']] )->getData();
	    saveToMongo( $scrollData, $countAll, $offset, $elastic, $mongoCollection );
    }
}


function stringToDate( $string )
{
    if( preg_match( '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+\+[\d:]+$/', $string ) ) $format = 'Y-m-d\TH:i:s.uT';
    elseif( preg_match( '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+$/', $string ) ) $format = 'Y-m-d\TH:i:s.u';
    elseif ( preg_match( '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\+[\d:]+$/', $string ) ) $format = 'Y-m-d\TH:i:sT';
    elseif ( preg_match( '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}$/', $string ) ) $format = 'Y-m-d\TH:i:s';
    elseif ( preg_match( '/^\d{4}-\d{2}-\d{2}\+[\d:]+$/', $string ) ) $format = 'Y-m-dT';
    elseif ( preg_match( '/^\d{4}-\d{2}-\d{2}$/', $string ) ) $format = 'Y-m-d';

    return DateTime::createFromFormat( $format, $string );
}

I am not sure if I use options like scroll or size in right way. I really dont understand where the problem is. Thanks a lot.

So the problem was with recursive call of saveToMongo() function. Its local variables exceed the memory limit.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.