Indexing office/PDF documents from directory

I am new to elastic and looking for connector to ingest office/pdf documents from file system.
With old versions there used to be river connector which is no longer supported (i assume)
I tried using logstash for the purpose but somehow its not working
Any suggestions?
following is logstash configuration

input {
file {
path => "D:/elastic/data/resumes/."
codec => plain { charset => "ISO-8859-1" }
start_position => "beginning"
}
}
filter
{
ruby {
init => "require 'base64'"
code => "event.set('data', Base64.encode64(event.get('message')))"
}
mutate {
remove_field => ["message"]
}
}
output {
elasticsearch {
action => "index"
hosts => ["192.168.1.2:9200"]
index => "resumedata"
document_type => "resumes"
pipeline => "attachment"
}
stdout { codec => rubydebug }
}

You might want to give a try to FSCrawler project.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.