Elasticsearch attachment parsing usecase

svb01 · April 4, 2017, 4:23pm

Hi,

I am exploring ElasticSearch as a solution for one of our usecase. I wanted to know whether the current ElasticSearch capabilities could meet our requirements.
For our use case, we want to index excel files in our repository.
The excel files have a standard template. The data is present in 2 columns. Column A contains certain "questions" and Column B has the answers to those questions
For our use case, if the user searches for a question (say on Row 19 A, we want to be able to show him the answers in column B ( i.e Row 19 B).

From what I have read, I would have to use the ingest-attachment plugin to index the files. However once I have the files indexed, would I be able to map the contents of the files as "Key", "Value" pairs in the index to search on?

What would be the best way to go about this?

Thanks.

dadoonet · April 4, 2017, 4:58pm

No you can't. You'd better convert your document into CSV then import it with Logstash for example.

Here is an example: http://david.pilato.fr/blog/2015/04/28/exploring-capitaine-train-dataset/

svb01 · April 4, 2017, 9:04pm

Thanks for the quick reply.
A follow-up question - The initial excel files are converted into html files using a tool we have. Would it be easier to index the html file directly in ElasticSearch via ingest-attachment?
Or is log-stash be the preferred approach to transform the data ( excel or html) before pushing to ElasticSearch?

dadoonet · April 4, 2017, 9:16pm

I believe you need structured data not unstructured.

So HTML won't help in my opinion.

svb01 · April 4, 2017, 9:22pm

I may not have explained properly.
We currently already have a process where initial excel files are created from an excel template and converted to a html document.

I have an opportunity to index the file either from the excel doc or from the html doc that is generated; to get the desired output that I mentioned in the initial question.

Would it be easier to use logstash on the excel doc or the converted html document. Or is it the same effort/process using logstash irrespective of the file format.

dadoonet · April 4, 2017, 10:03pm

CSV is easier to parse IMO.

system · May 2, 2017, 10:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index xlsx files in Elasticsearch Elasticsearch	3	2196	August 16, 2018
Ingest attachment plugin not analysing some html files Elasticsearch	15	1207	March 30, 2018
Process Xlsx files using Ingest Attachment Processor Plugin Elasticsearch	4	1119	December 11, 2016
Ingesting HTML file into elasticsearch Elasticsearch	6	5002	June 29, 2017
Can I Ingest Excel (.xlsx) Files Of Multiple Sheets Into Elasticsearch? Elasticsearch	2	378	December 15, 2023

Elasticsearch attachment parsing usecase

Related topics