Indexing pdf and word files


(Rony Armon) #1

Hello,
I know it was asked before but I still need your advice about indexing word documents and pdfs. I read the tutorial and installed the Attachment Processor Plugin but I can't figure out from the tutorial how to load documents (not attachment) from a specified folder.

Say I stored document.doc in document_dir directory.
Can you please provide the PUT request to index this document in an index named index_name?

Thanks


(David Pilato) #2

You can't upload a file to elasticsearch. Elasticsearch can not read a folder which contains files you want to index.

Have a look at FSCrawler project. It does that.