Best way to use Ingest Attachment Plugin

I am trying to setup ES for indexing document content for full text search. The document types that will be indexed are .docx and pdf.

Looks like ingest attachment plugin is to be used for this purpose and I've had a read of various articles including this post - How to specify file to Ingest Attachment

From what I understand, the plugin only takes bas64 encoded data for indexing and DOES NOT take an actual file "as-is"? I assumed that the fact that it uses Apache Tika meant that it has the capability to extract content from these doc types, so my question is, is there a way to actually send a file directly to this plugin or not?

thanks.

No it's not possible.

But you can instead use FSCrawler project for that. It can read from your filesystem or you can start its rest interface and "upload" your document to FSCrawler.

Thank you for your reply.

So is Apache Tika used to extract content from the base64 encoded data which is sent to it? Just trying to figure out the use of Apache Tika by this plugin. Thank you.

That's correct.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.