We have an application allowing users to export records based on search/filter criteria. I am looking for a solution to implement the "export all" functionality to a CSV file.
I conducted a test utilizing the _scroll API. The initial test was successful, allowing us to export data exceeding 80,000 records and successfully write it into a CSV file.
When attempting to export more than 5 lakh records, encountered a failure. Additionally, I'm wondering if this could have any impact on our financial pricing or if we need to consider increasing memory resources.
Here is my code for your review. Please let me know if any refactoring is needed.
// Function to export data to a CSV file in batches
function exportToCsvInBatches($data, $csvFilename, $batchSize)
{
$handle = fopen($csvFilename, 'w');
// Write header
fputcsv($handle, array_keys($data[0]));
// Write data in batches
$batchCount = ceil(count($data) / $batchSize);
for ($i = 0; $i < $batchCount; $i++) {
$batch = array_slice($data, $i * $batchSize, $batchSize);
foreach ($batch as $row) {
fputcsv($handle, $row);
}
}
fclose($handle);
}
// Elasticsearch index and query
$index = 'list';
$query = [
'bool' => [
'must' => [
['term' => ['user_id' => 2431]],
['match' => ['archive' => 'N']],
['match' => ['list_source' => 'LO Portal']],
],
],
];
// Parameters for Scroll API
$pageSize = 1000; // Number of documents per page
$scrollTime = '10m'; // Scroll time
// Batch size for exporting to CSV
$batchSize = 10000; // Adjust based on your requirements
try {
// Initial search request using the Scroll API
$response = executeScrollSearch($index, $query, $pageSize, $scrollTime, $client);
// Array to store all data
$allData = [];
// Process the initial search results
foreach ($response['hits']['hits'] as $hit) {
$allData[] = $hit['_source'];
}
// Continue scrolling until there are no more hits
while (isset($response['hits']['hits']) && count($response['hits']['hits']) > 0) {
$scrollId = $response['_scroll_id'];
// Perform the scroll request to get the next page
$response = $client->scroll([
'scroll_id' => $scrollId,
'scroll' => $scrollTime,
]);
// Process the next page of results
foreach ($response['hits']['hits'] as $hit) {
$allData[] = $hit['_source'];
}
}
// Clear the scroll to release resources
$client->clearScroll(['scroll_id' => $scrollId]);
// Define the CSV filename
$csvFilename = 'exported_data.csv';
// Export all data to the CSV file in batches
exportToCsvInBatches($allData, $csvFilename, $batchSize);
echo 'Data exported to ' . $csvFilename . PHP_EOL;
} catch (BadRequest400Exception $e) {
// Handle Elasticsearch-specific errors
echo 'Elasticsearch Error: ' . $e->getMessage() . PHP_EOL;
} catch (Exception $e) {
// Handle general PHP exceptions
echo 'Error: ' . $e->getMessage() . PHP_EOL;
}