we would like to setup a document synchronization from a file directory and from a e-mail imap folder to elasticsearch.
For the e-mails we are interested in the attachments. Here the imap input plugin looks promising.
For the files (e.g. PDFs) we would like to setup a directory that gets scanned and whenever a new file is inserted the whole file content (e.g. as a byte array) gets transferred.
Here the file input plugin looks like a candidate but maybe not. From the documentation it is not clear to me if we could configure it for the purpose I described.
Any opinions/suggestions? Maybe there are other plugins or totally different solutions out there, we are not aware of.
The file input plugin will most emphatically not be your friend I think. Logstash is really mostly about... well Logs, and is not really kitted out for use as an generalised Elasticsearch ETL tool.
I would suggest something a bit more custom perhaps. Considering you're doing things with PDFs, you might want to check out more about Ingest Pipelines and what those might offer (sorry, I'm not familiar with non-Logs use-cases with Ingest Pipelines).
No experience with the imap plugin, but its interesting to see that it can do that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.