Upload CSV File to Kibana Dashboard

Hi Team,

I need help on below two points while uploading csv file through kibana dashboard.

  1. How to upload a csv file size of more than 100MB through the kibana dashboard.
  2. How to upload multiple csv files to same indices.

Thanks,
Debasis

thats not possible thru kibana UI, but you can use various methods to get the data indexed to elasticsearch.

  • beats: you can use filebeat to monitor your folder for files and automatically index them, here is a sample filebeats config
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /path/to/your/csv/files/*.csv

output.elasticsearch:
  hosts: ["http://your-elasticsearch-host:9200"]
  index: "your_index_name"

you can use es bulk api to index the documents, and maybe a simple python script to do the processing for you

from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk

es = Elasticsearch(['http://localhost:9200'])  # Replace with your Elasticsearch URL

# Read your CSV file and create a list of dictionaries for each row
# For example, you can use Python's CSV module for this.
data = [{"field1": value1, "field2": value2, ...}, {...}, ...]

# Index the data into Elasticsearch
bulk(es, data, index='your_index_name', doc_type='your_doc_type')
1 Like

Thanks @ppisljar for your response to my first question. Could you please help me with the second requirement? For example, I am uploading a text.csv to the index "sample" and next want one more csv file to the same index "sample". Is there any way to achieve this?

Thanks,
Debasis

Hi @ppisljar,

Do you have any reference blog where we can get how filebeats can be configured such way to upload csv files to the particular index.

Thanks,
Debasis

here is something i found on the web: Load CSV data to ElasticSearch using FileBeat

Thanks @ppisljar. I had created the ingest pipelines but where to check if it is successful or failed because I could not see the data in the corresponding index. Is there any way to check this?

Thanks,
Debasis

logs should be in /var/log/filebeat/filebeat

Below is my unit file /usr/lib/systemd/system/filebeat.service and when I check under /var/log there is no filebeat folder as I mentioned above is there any way to check if ingest pipeline is working or in failed state.

Preformatted text> UMask=0027
> Environment="GODEBUG='madvdontneed=1'"
> Environment="BEAT_LOG_OPTS="
> Environment="BEAT_CONFIG_OPTS=-c /etc/filebeat/filebeat.yml"
> Environment="BEAT_PATH_OPTS=--path.home /usr/share/filebeat --path.config /etc/filebeat --path.data /var/lib/filebeat --path.logs /var/log/filebeat"
> ExecStart=/usr/share/filebeat/bin/filebeat --environment systemd $BEAT_LOG_OPTS $BEAT_CONFIG_OPTS $BEAT_PATH_OPTS
> Restart=always

Thanks,
Debasis

can you confirm filebeat is running ?

try to follow this tutorial to get it running: Filebeat quick start: installation and configuration | Filebeat Reference [8.10] | Elastic

Hi
Yes, my filebeat is up and running.

[root@cb-1 ~]# systemctl status filebeat
● filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
   Loaded: loaded (/usr/lib/systemd/system/filebeat.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2023-09-18 16:26:43 IST; 2 days ago
     Docs: https://www.elastic.co/beats/filebeat
 Main PID: 14193 (filebeat)
````Preformatted text`

Thanks,
Debasis

Hi @ppisljar ,

I followed the same doc to install filebeat and it is up and running as per the below command. Is there any way to validate the ingest pipeline if it is working properly or not?

sytemctl status filebeat

Thanks,
Debasis

HI @ppisljar ,

Could you please help here?

Thanks,
Debasis

You could use GET /_nodes/stats?metric=ingest&filter_path=nodes.*.ingest.pipelines to get statistics about your ingest pipeline to see failed count. You can run it from Dev Tools in Kibana.

Is there any way to validate the ingest pipeline if it is working properly or not?

You can use the simulate pipeline API to test ingest pipeline.

Hi @kcreddy Thanks for your response. Now I can see the ingest pipeline details from Dev tools as below. Which means data loaded to my indices but when I search the data under Discover tool nothing showing for sales indices so am I missing anything here.

>  "parse_sales_data": {
>             "count": 44462,
>             "time_in_millis": 269,
>             "current": 0,
>             "failed": 1,
>             "processors": [
>               {
>                 "csv": {
>                   "type": "csv",
>                   "stats": {
>                     "count": 44462,
>                     "time_in_millis": 180,
>                     "current": 0,
>                     "failed": 0
>                   }
>                 }
>               },

Thanks,
Debasis

Hi @kcreddy ,

In addition to the above issue, just want to inform you that I followed the below link to create a pipeline of sales (testing the theoretical part since I am new to the elasticsearch world) before doing the actual data load which is in csv format.

Thanks,
Debasis

Can you provide both the ingest pipeline and also filebeat configuration with couple csv rows?

Can you query your sales index from Dev Tools and check if you can find documents from there?

GET sales/_search
{
  "query":{
    "match_all" : {}
  }
}

If so, maybe the Data View created might be wrong which is not pointing to the index where data is ingested. In this case documents might have been ingested into sales index, but your Data View doesn't contain sales index.

Also, you seem to have 1 failure in the pipeline. You could have an on_failure clause inside your pipeline to add error.message field to understand why the failure occurred. More info here.

"on_failure": [
    {
      "set": {
        "description": "Record error information",
        "field": "error.message",
        "value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message {{ _ingest.on_failure_message }}"
      }
    }
  ]

Hi @kcreddy ,
Please find the ingest pipeline details as belwo.

PUT _ingest/pipeline/parse_sales_data
{
"processors": [
{
"csv": {
"description": "Parse sales data from scanner",
"field": "message",
"target_fields": ["sr","date","customer_id","transaction_id","sku_category","sku","quantity","sales_amount"],
"separator": ",",
"ignore_missing":true,
"trim":true
},
"remove": {
"field": ["sr"]
}
}
]
}

Below are some records from scanner-data.csv file

Below are some records from scanner-data.csv file

,Date,Customer_ID,Transaction_ID,SKU_Category,SKU,Quantity,Sales_Amount
1,02/01/2016,2547,1,X52,0EM7L,1,3.13
2,02/01/2016,822,2,2ML,68BRQ,1,5.46
3,02/01/2016,3686,3,0H2,CZUZX,1,6.35
4,02/01/2016,3719,4,0H2,549KK,1,5.59

I made the below changes in filebeat.yml

============================== Filebeat inputs ===============================

  • type: log
    enabled: true
    paths:
    • /cbdata/elasticsearch/scanner-data.csv # path to your CSV file
      exclude_lines: [^""] # header line
      index: sales
      pipeline: parse_sales_data

---------------------------- Elasticsearch Output ----------------------------

output.elasticsearch:

Array of hosts to connect to.

hosts: ["https://xx.xx.xx.xx:9200","https://xx.xx.xx.xx:9200","https://xx.xx.xx.xx:9200"]

Protocol - either http (default) or https.

protocol: "https"

Authentication credentials - either API key or username/password.

#api_key: "id:api_key"
username: "elastic"
password: "elastic"
ssl:
enabled: true
certificate_authorities: ["/etc/filebeat/certs/cert.pem"]

Hi @kcreddy ,

As I mentioned earlier I had followed the steps mentioned in below link.

Thanks,
Debasis

Hi @kcreddy ,

Did you find the time to look into the above issue.

Thanks,
Debasis

Hey, I was going through the tutorial and was able to ingest without any problem. The data you presented above is different from the ones in the tutorial. For example, the date format is different. There might be WARN or ERROR messages in your filebeat logs indicating failure to index the document due to parsing in wrong format.

If there are other errors, I would check inside the filebeat logs.