I have some attachments which need to be ingested into elastic.
There are of varying types i.e PDF, TXT, CSV, XLS, DOC, MSG (Outlook emails).
As far as I could find out , I got to know that I can use Ingest Attachment Processor Plugin to extract the content of my files into elastic.
There are multiple forums that are covering off most of file types. The Outlook emails are the majority of the documents we have - would the ingest processor be able to manage this ?
Also, just for my knowledge, the above approach is about reading contents of file and ingesting them onto elastic. The API search would just return the extracted contents.
Is there a way the file gets loaded into elastic as is without extraction of data from it, I can have an index on the fileName and retrieve the whole file. If Yes, how would the end user view the file.
This looks great .. But then our use case is much more than just accessing / retrieving documents.
So we have an Oracle 11g instance with around 4-5 TB of data (out of which some are documents). This has historical data which we want to access.
The initial plan was leverage the ELK stack , use logstash and get data into elasticsearch . We have a java application which will then connect to elastic via API and retrieve the results.
With the above approach - we had some reservations around handling attachments.
The workplace search looks great - we could move the documents to sharepoint and use workplace search but then we would have to use workplace search for attachments and then for other historical data look at the java application leveraging results off ELK
I'd just like to add the answer to the initial question
No. Not every type of documents. The ingest attachment plugin does not support all content files. I'm unsure about Outlook email files. FSCrawler should supports those files though. And it has a Work In Progress branch which will connect the local files to Workplace Search.