I'm using Spark to build text files for the Bulk API and I staged all 80GB of them in S3. I have 45gb of data and I just realized I can't use curl to POST a file that is on S3!
A) I can write a bash script to download the file and POST it to ES. This is not optimal and I can's ssh into one of the hosted ES nodes and run it from there.
B) I could stream the data with lambda but the data is already in bulk format and I don't want to stream at this time.
I want to bulk load my data set and play with different ES configurations to find the correct cluster size.
I know there are many alternatives but I'd like to see if it's possible to use the bulk api from S3.