I occasionally have the need to pull events out of Elasticsearch. I built a script using PowerShell that utilizes scroll to pull all the records. The problem I have though is that I can only pull 10k events per request. I typically get approx. 5-8 million events per day. As you can imagine, this results in a large number of requests hitting the server and it takes a looong time to pull all the records, usually takes about 60-90 seconds per request.
Is it possible to get more than 10k at a time? Below is the script I'm using
#Force PowerShell to use TLS 1.2
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::TLS12
#Get username/password for kibana user
$cred = Get-Credential
#Specify index to query - wildcard ending implied
$is = "myindex-*"
#Set base elasticsearch URL
$esbase = "https://ELASTICSEARCHFQDN:9200"
#Set folder to save files to
$path = "PATH_TO_SAVE_TO"
#Set headers
$Options = @{
headers = @{
'kbn-version' = "7.5.0"
}
ContentType = "application/json"
Method = "POST"
}
#This can be used as an example, it is performing a range query on field edge.timestamp_start whose values are in epoch seconds.
$query = '{ "sort": ["_doc"], "query": { "range": { "edge.timestamp_start": { "gte": "1577772000", "lte": "1577858399" } } } }'
#Initial query specifying parameters and index to search
$isr = Invoke-RestMethod @Options -Uri "$esbase/$is*/_search?scroll=10m&size=10000" -Body $query -Credential $cred -Verbose
#Verify destination folder exists, create if it does not.
if ((Test-Path $path) -eq $false) {
New-Item -Type Directory $path | Out-Null
}
#Start loop to process the initial 10k results and subsequent results.
do {
#Grab the current time to be used in the filename
$date = Get-Date -Format MMddyyyy_hhmmss
#Start loop to dump results into a text file
foreach ($hit in $isr.hits.hits) {
#Narrow file output data to only the raw message field.
$out = $hit._source.message
#Output message field to a text field, appending the existing file so as not to create 10k tiny files.
$out | Out-File $path\June$Date.txt -Encoding utf8 -Append
}
#Compress the resulting txt file. In my use case, each txt file was 16MB, compressed was 1.5MB
#Compress-Archive -Path $path\June$Date.txt -Update -DestinationPath $path\June$Date.zip -CompressionLevel Fastest
#Remove original txt file after compression is complete.
#Remove-Item $path\June$Date.txt -Confirm:$false
#Grab the scroll id from the previous successful query to feed into the next query.
$sid = $isr._scroll_id
#Reset query timeout to 5 minutes and insert scroll id
$query = '{ "scroll": "5m", "scroll_id": "'+$sid+'" }'
#Gather the next 10k results
$isr = Invoke-RestMethod @Options -Uri "$esbase/_search/scroll" -Body $query -Credential $cred -Verbose
#Continue loop until the last query returns 0 results.
} until ($isr.hits.hits.count -eq 0)