Elasticsearch 2 scroll in PHP - memory issue


(Vladimír Čamaj) #1

Hi, can somebody help me please to solve the problem with Elasticsearch 2 scroll API? I need to migrate data from Elastic 2 to Mongodb. I have a php script below which throws me an error VirtualAlloc() failed: [0x000005af] The paging file is too small for this operation to complete after few millions items. I did it according this guide for ES2 and this is the script which is quite simple:

indent preformatted text by 4 spaces

<?php

error_reporting( E_ALL );
ini_set( 'memory_limit', -1 );
ini_set( 'max_execution_time', -1 );

/** @var \Nette\DI\Container $container */
$container = require( __DIR__ . '/../app/bootstrap.php' );

/** @var MongoConnect $mongo */
$mongo = $container->getService( 'mongo' );
/** @var \MongoDB\Collection $eventsCollection */
$eventsCollection = $mongo->selectCollection( 'WebApp', 'Events2' );

/** @var Elastica\Client $elastic */
$elastic = new Elastica\Client();
/** @var Elastica\Index $elasticIndex */
$elasticScrollData = $elastic->getIndex( 'event' )->request( '_search?scroll=3s', 'POST', ['size' => 250, 'sort' => '_doc'] )->getData();
$countAll = $elasticScrollData['hits']['total'];

$offset = 0;
saveToMongo( $elasticScrollData, $countAll, $offset, $elastic, $eventsCollection );


function saveToMongo( $scrollData, $countAll, $offset, \Elastica\Client $elastic, \MongoDB\Collection $mongoCollection )
{
    $documents = [];
    foreach ( $scrollData['hits']['hits'] as $item )
    {
	    //if( $item['_type'] == 'kataster' ) continue;  // Throws an error after 1,200000

	    $doc = [];
	    foreach ( $item['_source'] as $key => $val )
	    {
		    if( in_array( $key, ['publishDate', 'generateDate', 'eventDate'] ) )
		    {
			    $date = stringToDate( $val );
			    if( $date instanceof DateTime ) $doc[$key] = new \MongoDB\BSON\UTCDateTime( $date->format('U') * 1000 );
			    else \Tracy\Debugger::log( 'Invalid ' . $key . ' - ' . $date, 'events-import-error' );
		    }
		    else $doc[$key] = $val;
	    }

	    if( isset( $item['_type'] ) ) $doc['type'] = $item['_type'];
	    $doc['oldEsId'] = $item['_id'];

	    $documents[] = $doc;

	    $offset++;
    }

    try
    {
	    $mongoCollection->insertMany( $documents, ['ordered' => FALSE] );
	    echo '--- offest ' . ( $offset ) . ' OK' . "\n";
    }
    catch( \Exception $e )
    {
	    echo '+++ insert exception: ' . $e->getMessage() . "\n";
    }


    if( $offset < $countAll )
    {
	    $scrollData = $elastic->request( '_search/scroll', 'POST', ['scroll' => '3s', 'scroll_id' => $scrollData['_scroll_id']] )->getData();
	    saveToMongo( $scrollData, $countAll, $offset, $elastic, $mongoCollection );
    }
}


function stringToDate( $string )
{
    if( preg_match( '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+\+[\d:]+$/', $string ) ) $format = 'Y-m-d\TH:i:s.uT';
    elseif( preg_match( '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+$/', $string ) ) $format = 'Y-m-d\TH:i:s.u';
    elseif ( preg_match( '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\+[\d:]+$/', $string ) ) $format = 'Y-m-d\TH:i:sT';
    elseif ( preg_match( '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}$/', $string ) ) $format = 'Y-m-d\TH:i:s';
    elseif ( preg_match( '/^\d{4}-\d{2}-\d{2}\+[\d:]+$/', $string ) ) $format = 'Y-m-dT';
    elseif ( preg_match( '/^\d{4}-\d{2}-\d{2}$/', $string ) ) $format = 'Y-m-d';

    return DateTime::createFromFormat( $format, $string );
}

I am not sure if I use options like scroll or size in right way. I really dont understand where the problem is. Thanks a lot.


(Vladimír Čamaj) #2

So the problem was with recursive call of saveToMongo() function. Its local variables exceed the memory limit.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.