Bulk Export More Than 10k Events

I occasionally have the need to pull events out of Elasticsearch. I built a script using PowerShell that utilizes scroll to pull all the records. The problem I have though is that I can only pull 10k events per request. I typically get approx. 5-8 million events per day. As you can imagine, this results in a large number of requests hitting the server and it takes a looong time to pull all the records, usually takes about 60-90 seconds per request.

Is it possible to get more than 10k at a time? Below is the script I'm using

#Force PowerShell to use TLS 1.2
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::TLS12

#Get username/password for kibana user
$cred = Get-Credential

#Specify index to query - wildcard ending implied
$is = "myindex-*"

#Set base elasticsearch URL
$esbase = "https://ELASTICSEARCHFQDN:9200"

#Set folder to save files to
$path = "PATH_TO_SAVE_TO"

#Set headers
$Options = @{
  headers = @{
    'kbn-version' = "7.5.0"
  }
  ContentType = "application/json"
  Method = "POST"
}
#This can be used as an example, it is performing a range query on field edge.timestamp_start whose values are in epoch seconds.
$query = '{ "sort": ["_doc"], "query": { "range": { "edge.timestamp_start": { "gte": "1577772000", "lte": "1577858399" } } } }'

#Initial query specifying parameters and index to search
$isr = Invoke-RestMethod @Options -Uri "$esbase/$is*/_search?scroll=10m&size=10000" -Body $query -Credential $cred -Verbose

#Verify destination folder exists, create if it does not.
if ((Test-Path $path) -eq $false) {
  New-Item -Type Directory $path | Out-Null
}

#Start loop to process the initial 10k results and subsequent results.
do {

  #Grab the current time to be used in the filename
  $date = Get-Date -Format MMddyyyy_hhmmss

  #Start loop to dump results into a text file
  foreach ($hit in $isr.hits.hits) {

    #Narrow file output data to only the raw message field.
    $out = $hit._source.message

    #Output message field to a text field, appending the existing file so as not to create 10k tiny files.
    $out | Out-File $path\June$Date.txt -Encoding utf8 -Append
  }

 #Compress the resulting txt file.  In my use case, each txt file was 16MB, compressed was 1.5MB
  #Compress-Archive -Path $path\June$Date.txt -Update -DestinationPath $path\June$Date.zip -CompressionLevel Fastest

  #Remove original txt file after compression is complete.
  #Remove-Item $path\June$Date.txt -Confirm:$false

  #Grab the scroll id from the previous successful query to feed into the next query.
  $sid = $isr._scroll_id

  #Reset query timeout to 5 minutes and insert scroll id
  $query = '{ "scroll": "5m", "scroll_id": "'+$sid+'" }'
  
  #Gather the next 10k results
  $isr = Invoke-RestMethod @Options -Uri "$esbase/_search/scroll" -Body $query -Credential $cred -Verbose

#Continue loop until the last query returns 0 results.
} until ($isr.hits.hits.count -eq 0)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.