Elastic Search injest attachment cannot index multiple pdf

I have an mysql table with multiple pdfs and an associated item_name for

title	         pdf_s3_link
    --------         --------------
    Harry Potter	linktopdfons3
    Batman	        s3_link_to_pdfons3

I am trying to injest these data into my Elasticsearch index so that if there's a match I need to display the title. I am trying to use injest api but I dont know how to run this automatically. (I am doing a POC so this is not on escloud yet but in the future the whole system, injestion should be serverless + es cloud).

I came across fscrawler but even with that I cannot download all those pdf to a local directory. Whats the best way out of this ?

Are the PDFs on S3 or in the database as blobs?

free to access pdf links

I wrote a python script using pypdf2 which I could run as a cron job, but its slow and I am not sure if there is a better way around it. fscrawler does not allow links

There's nothing in the Elastic Stack that would do this for you unfortunately.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.