Import (21gb) csv to elasticsearch

Please advice me how can i upload this huge file. Any sample java/python or name of plugin to use will help. I am doing on local machine and single node.

Hi @examin,

Logstash has a csv filter.

You could start with a few lines and just output to STDOUT.

The Logstash config would be something like (I have not used this so might not be 100% right)

input { stdin { } }

filter {
  csv {
    autodetect_column_names => true
  }
}
output {
  stdout { codec => rubydebug }
}

Download Logstash and start with

logstash-6.3.1/bin/logstash -f above_config.conf

Paste in one line and see how it is parsed. Once you are happy you can change input to file and output to Elasticsearch.

I'm sure there are many other ways to do it as well.

1 Like

I have done same , it is running since 4 days and nearly 40% data inserted. Thats why i think i need bit faster method.

What is the specification of your Elasticsearch cluster? Is Logstash running on the same host? What does CPU and disk I/O look like?

i5 dual core processor.
2gb radeon gpu
1tb hdd
8gb ram
OS: centos 7 x64

I have one column in csv who have dates. I need one more column where only month and year are. Whats new thing i have to add in csvfile's config to do that .

You do not want to add that field directly in the CSV file? If you use Logstash you could use a grok filter but that would slow things down even more (probably, I expect you are limited by the CPU speed. Also, learning the ins and outs of grok take some time unless you already know how to use it...).

Another way to go, if you would like an alternative to Logstash reading the CSV file would be to make the CSV into JSON first, e.g. using python.

Something like http://www.andymboyle.com/2011/11/02/quick-csv-to-json-parser-in-python/

You could add the year and/or month fields while parsing CSV to JSON.

The JSON data could then be loaded into Elasticsearch in many ways, one being using Logstash :wink:

I would again experiment with a smaller data set.

is there any good book to learn elasticsearch. I found most books too old. Please suggest one

Although old most of advices are good. See https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html

But IMHO the best place to learn is https://training.elastic.co/

Disclaimer: I'm an elastic employee :wink:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.