Converting whole CSV file in one field into multiple documents for each csv row (using headers as field names)

Hi,

I have an index with a field that contains a whole CSV file (headers included). I want to convert each csv row into a separate document containing all fields of the corresponding csv row (if possible the field names should correspond to the csv header column names).

I found that with ingest I can use some processors (like csv or grok) to 'extract' all csv fields and add each field only into the same document.

But how can I create multiple documents from this single field/document?

Thanks, Tony

I don't think you can have an automatic field naming if this is what you are looking for.

Thanks for your response.

First thing would be to know how I can create multiple documents from one field. Do you know how to do this?

The automatic field naming is not that important.

A document per field or having one document per row?

I have one field called 'attachment' in a document containing a whole csv file (headers included).

Lets say the csv file has n rows. I want n documents to be created from the attachment field.

What are you using to send the CSV to Elasticsearch?

I'd personally use filebeat or logstash.

I wrote an example with Logstash here: https://www.elastic.co/fr/blog/enriching-your-postal-addresses-with-the-elastic-stack-part-1

If you want to use filebeat (I recommend it), here is a sample filebeat configuration:

And the associated ingest pipeline:

HTH

Thanks for your response.

I am aware of filebeat to read logs or csv data into elasticsearch. But I am receiving the csv-files from mongodb / gridfs. I am using monstache to get my mongodb real-time-synchronized with elasticsearch. It´s working but the way the csv files are stored in my index is as a new field containing the whole csv file. So I can not use any logstash preprocessor or filebeat.

I was just looking for a solution using ingest processors but it seems, they are only operating on one document. Generally I am looking for a way to create (index?) a number of documents using the csv file stored in one field.

Maybe I should formulate this question more general and add a new thread.

Ingest pipelines are applied per document. You can't split a document into multiple ones.

Ok, so processors are not working. Is there another way to do this?

I think I need to solve this:

  • Basically how can I achieve an attachment.context containing a list structure beeing indexed itself?
  • Or more abstract how can I achieve an array field beeing indexed? (I could transform the attachment.context to an array with ingest processors).

IMO you need to solve that problem in the way you are sending data to elasticsearch.

I'd probably do something like:

  • read the CSV file from mongo and store it locally
  • send every single line of data to elasticsearch (but the header) using a bulk request
  • remove the local CSV file

I do not want to implement an additional components for this. Just thought that there should be an elasticsearch way to index contexts of single fields as new documents (to maybe another index).

Anyway, thanks for your time and effort.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.