I don't read Python code. So if you could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.
The other solution you were saying ingest-attachment, am not familiar on how to do that !!
Not really another solution but part of it. If you want to extract text from a PDF document, you can use:
- ingest-attachment: Ingest Attachment plugin | Elasticsearch Plugins and Integrations [8.11] | Elastic
- FSCrawler: GitHub - dadoonet/fscrawler: Elasticsearch File System Crawler (FS Crawler)
- Apache Tika directly in Java: https://tika.apache.org/