Elasticsearch attachment parsing usecase


I am exploring ElasticSearch as a solution for one of our usecase. I wanted to know whether the current ElasticSearch capabilities could meet our requirements.
For our use case, we want to index excel files in our repository.
The excel files have a standard template. The data is present in 2 columns. Column A contains certain "questions" and Column B has the answers to those questions
For our use case, if the user searches for a question (say on Row 19 A, we want to be able to show him the answers in column B ( i.e Row 19 B).

From what I have read, I would have to use the ingest-attachment plugin to index the files. However once I have the files indexed, would I be able to map the contents of the files as "Key", "Value" pairs in the index to search on?

What would be the best way to go about this?


No you can't. You'd better convert your document into CSV then import it with Logstash for example.

Here is an example: http://david.pilato.fr/blog/2015/04/28/exploring-capitaine-train-dataset/

Thanks for the quick reply.
A follow-up question - The initial excel files are converted into html files using a tool we have. Would it be easier to index the html file directly in ElasticSearch via ingest-attachment?
Or is log-stash be the preferred approach to transform the data ( excel or html) before pushing to ElasticSearch?

I believe you need structured data not unstructured.

So HTML won't help in my opinion.

I may not have explained properly.
We currently already have a process where initial excel files are created from an excel template and converted to a html document.

I have an opportunity to index the file either from the excel doc or from the html doc that is generated; to get the desired output that I mentioned in the initial question.

Would it be easier to use logstash on the excel doc or the converted html document. Or is it the same effort/process using logstash irrespective of the file format.

CSV is easier to parse IMO.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.