I have added some pages to AppSearch. The documentation shows that Crawler does not analyze the documents (pdf / docx) inside the page.
- I am interested in whether the links to these documents are stored somewhere (I couldn't find them).
- If so, is it possible to somehow automate with a plugin "ingest-attachment" to parse data from these files?
- if not, are there any solutions that would help me with this, do I have to write an external script that will collect all documents, convert them to base64 and pass through API for indexing
- Can you set the data to be passed to a specific index in elasticseach using the previously mentioned plugin?
- Is the previously mentioned plug-in able to handle duplicate files?
- if I save the data from files under the selected index, how to combine them with the crawler data so that the prepared search engine uses both
- Is there a way to verify the source, such as the content of a page or file, by configuring additional search fields for SearchUI.
I would like to solve this without using FSCrawler.