Converting whole CSV file in one field into multiple documents for each csv row (using headers as field names)

tonyskulk · March 9, 2020, 7:21pm

Hi,

I have an index with a field that contains a whole CSV file (headers included). I want to convert each csv row into a separate document containing all fields of the corresponding csv row (if possible the field names should correspond to the csv header column names).

I found that with ingest I can use some processors (like csv or grok) to 'extract' all csv fields and add each field only into the same document.

But how can I create multiple documents from this single field/document?

Thanks, Tony

dadoonet · March 10, 2020, 12:52am

I don't think you can have an automatic field naming if this is what you are looking for.

tonyskulk · March 10, 2020, 2:14am

Thanks for your response.

First thing would be to know how I can create multiple documents from one field. Do you know how to do this?

The automatic field naming is not that important.

dadoonet · March 10, 2020, 6:15am

A document per field or having one document per row?

tonyskulk · March 10, 2020, 6:38am

I have one field called 'attachment' in a document containing a whole csv file (headers included).

Lets say the csv file has n rows. I want n documents to be created from the attachment field.

dadoonet · March 10, 2020, 7:25am

What are you using to send the CSV to Elasticsearch?

I'd personally use filebeat or logstash.

I wrote an example with Logstash here: https://www.elastic.co/fr/blog/enriching-your-postal-addresses-with-the-elastic-stack-part-1

If you want to use filebeat (I recommend it), here is a sample filebeat configuration:

github.com

dadoonet/bano-elastic/blob/master/filebeat-config/filebeat-cloud.yml

filebeat.inputs:
- type: log
  paths:
    - /bano-data/bano-*.csv

output.elasticsearch:
  indices:
    - index: "bano"
  pipeline: bano

And the associated ingest pipeline:

github.com

dadoonet/bano-elastic/blob/master/cloud/ingest-bano.json

{
  "processors": [
    {
      "csv": {
        "field": "message",
        "target_fields": [
          "_id",
          "address.number",
          "address.street_name",
          "address.zipcode",
          "address.city",
          "source",
          "location.lat",
          "location.lon"
        ]
      }
    },
    {
      "convert": {
        "field": "location.lat",

This file has been truncated. show original

HTH

tonyskulk · March 10, 2020, 7:35am

Thanks for your response.

I am aware of filebeat to read logs or csv data into elasticsearch. But I am receiving the csv-files from mongodb / gridfs. I am using monstache to get my mongodb real-time-synchronized with elasticsearch. It´s working but the way the csv files are stored in my index is as a new field containing the whole csv file. So I can not use any logstash preprocessor or filebeat.

I was just looking for a solution using ingest processors but it seems, they are only operating on one document. Generally I am looking for a way to create (index?) a number of documents using the csv file stored in one field.

Maybe I should formulate this question more general and add a new thread.

dadoonet · March 10, 2020, 7:39am

Ingest pipelines are applied per document. You can't split a document into multiple ones.

tonyskulk · March 10, 2020, 7:50am

Ok, so processors are not working. Is there another way to do this?

I think I need to solve this:

Basically how can I achieve an attachment.context containing a list structure beeing indexed itself?
Or more abstract how can I achieve an array field beeing indexed? (I could transform the attachment.context to an array with ingest processors).

dadoonet · March 10, 2020, 8:18am

IMO you need to solve that problem in the way you are sending data to elasticsearch.

I'd probably do something like:

read the CSV file from mongo and store it locally
send every single line of data to elasticsearch (but the header) using a bulk request
remove the local CSV file

tonyskulk · March 10, 2020, 8:31am

I do not want to implement an additional components for this. Just thought that there should be an elasticsearch way to index contexts of single fields as new documents (to maybe another index).

Anyway, thanks for your time and effort.

system · April 7, 2020, 8:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Split csv column to several fields Logstash	16	6039	July 6, 2017
How to push csv data from filebeat to elasticsearch Elasticsearch	3	1028	September 24, 2020
Load multiple CSV to multiple index in a single conf file Logstash	2	794	March 24, 2020
How to siparate message string to fields? Kibana	8	4882	March 11, 2019
Seperate the elements into multiple documents using ingest pipeline Elasticsearch ingest-pipeline	3	180	April 3, 2025

Converting whole CSV file in one field into multiple documents for each csv row (using headers as field names)

Related topics