How to index files?

I have added some pages to AppSearch. The documentation shows that Crawler does not analyze the documents (pdf / docx) inside the page.

  1. I am interested in whether the links to these documents are stored somewhere (I couldn't find them).
  2. If so, is it possible to somehow automate with a plugin "ingest-attachment" to parse data from these files?
  3. if not, are there any solutions that would help me with this, do I have to write an external script that will collect all documents, convert them to base64 and pass through API for indexing
  4. Can you set the data to be passed to a specific index in elasticseach using the previously mentioned plugin?
  5. Is the previously mentioned plug-in able to handle duplicate files?
  6. if I save the data from files under the selected index, how to combine them with the crawler data so that the prepared search engine uses both
  7. Is there a way to verify the source, such as the content of a page or file, by configuring additional search fields for SearchUI.

I would like to solve this without using FSCrawler.

All URLs that we find during a crawl will have a corresponding event logged together with a decision on whether it was denied or allowed. You can read more about those logs here.