Upload a CSV file in Kibana UI through pipeline with custom mapping

I want to upload a single CSV file to Elasticsearch using just the Kibana UI. The file should go through a pipeline and the index needs custom settings. I am currently doing this with a complicated manual process and wondering if the process can be streamlined.

I have this data csv file on my local machine.

color,size,description
blue,1.5,the cat is happy
red,2.8,the dogs are sad
yellow,3.4,the 2 birds are sleepy

I want to upload it to an index with the following settings.

{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "stemmer",
            "stop"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "color": {
        "type": "keyword"
      },
      "size": {
        "type": "double"
      },
      "description": {
        "type": "text"
      }
  }
}

Note that the "description" field is type text and I specify a custom default tokenizer.

I want to ingest the data through the following pipeline.

{
  "description": "Test Ingestion Pipeline",
  "processors": [
    {
      "trim": {
        "field": "color",
        "ignore_missing": true
      }
    },
    {
      "trim": {
        "field": "description",
        "ignore_missing": true
      }
    }
  ]
}

I don't have Logstash or Beats set up, and I don't want to use the Python client. I'd like to do everything through the Kibana UI.

Here is what I currently do.

  1. Use the Dev console to create a test index with the above custom configuration.
  2. Use the Dev console to create a test-pipeline with the above configuration.
  3. Use the Upload File integration to upload the CSV file from my local machine to an index called test-input.
  4. Use the Dev console to reindex test-input into test using the test-pipeline pipeline.

I then delete test-input and work with test.

This works but it is complicated and error-prone. How can I make the process simpler?

Hi @wpm,

Welcome back! Have you tried using the Kibana file upload tool?

Let us know!

is your ingest pipeline really so simple? Stripping some blanks is one line of sed/awk/perl/python/...

$ cat silly.csv | tr " " "="
color,size,description
blue,1.5,==the=cat=is=happy
red,2.8,==the=dogs=are=sad=====
yellow,3.4,the=2=birds=are=sleepy.
orange,2.3,no=rest=for=the=wicked====

$ cat silly.csv | awk -F, '{gsub(/^[[:blank:]]*/, "", $3); gsub(/[[:blank:]]*$/, "", $3) ; OFS=",";print}'
color size description
blue,1.5,the cat is happy
red,2.8,the dogs are sad
yellow,3.4,the 2 birds are sleepy.
orange,2.3,no rest for the wicked

Yes. That is what I mean in step (3) above by the "Upload File integration".

The problem is, that the Upload File tool does not allow me to specify a custom settings, and gives me an "index already exists" error if I try to ingest a CSV file into an empty index with custom settings.

(I believe the Upload File tool does allow me to specify a custom mapping, but not to add a custom token analyzer.)

I'm sure I could preprocess things to my liking with Python scripts, but then I'd have a separate Python script lying around and have remember to use it and remember to tell other people to use it. That's another case of multiple error-prone manual steps.

I'm looking for the most turnkey way to ingest a single CSV. I want to use Elasticsearch's built-in ETL features instead of writing my own so the ETL will all be in one place.

My concern is not with whether it can be done, but whether it can run by someone who is not an Elasticsearch expert. From the documentation, it's not clear to me if this can be done without setting up a full-blown Beats/Logstash system.

You do not need to write your own, there are plenty of tools that helps you send data into Elasticsearch, from Elastic itself you can use Filebeat or Logstash for example.

The Upload file in Kibana is intended to be used mostly to check how your data looks like, to validate it or to ingest some things quickly, if you need a data ingestion flow that needs to be repeated multiple times, then you need to use other tools to send data to Elasticsearch.

What you want to do can be done using index templates, to create the mappings of your index, and ingest pipelines, to parse your csv data and create the required fields.

But you would need other tools to send this data into Elasticsearch, it can even be a simple curl request.

This is how it is normally done, trying to do some kind of ETL using the Upload File in Kibana in my opinion complicates a problem that has been already solved multiple times.

For example, if you want to ingest those CSV files in Elasticsearch, you can configure filebeat or logstash to track some specific folder, and every time a file is created in this folder, it would be processed, parsed and ingested in Elastiscearch.

A lot of the steps here can be automated to the point that the user just needs to put a file into a folder for example.

1 Like

User-drops-a-file-in-a-directory is the kind of interface I want. Is index Templates + Filebeat the easiest way to implement this?