Performance issues while importing CSV files into Elasticsearch

jnci · August 9, 2018, 2:52pm

I'm trying to import some gigabytes of CSV files (approximately 100+ million rows with 5 columns) but the throughput is very low (~1mb/s). I'm not quite sure yet what the issue may be, but maybe someone here has some leads.

What throughput could I expect on an 8GB, i7 (octocore) box with an SSD for the stack and an external HDD from which the data is imported (USB 3.0, known to be readable at 200mb/s+), using the default Security Onion stack (Evaluation mode)? Are there any known throughput issues with importing CSV files using the CSV filter? Likewise for using the Date filter (to extract timestamp values from the CSV)?

Badger · August 9, 2018, 5:28pm

When logstash starts it will log something like

[logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

You can query that using the APIs. For example this will tell you the time spent in each part of the pipeline.

curl -XGET 'localhost:9600/_node/stats/pipelines?pretty'

On one of my servers I see a csv filter processing about 7,000 rows per second with a single worker thread. That would scale with the number of CPUs. A simple date filter is cheaper than a 5 column csv.

csv appears to be quite expensive. A stripped down regex that does not handle quoted fields gets about 3 times the throughput.

    ruby { code => '
        m = event.get("message").scan(/([^,]+)(,|$)/)
        m.each_index { |i|
            event.set("column#{i}", m[i][0])
        }
    ' }

See this blog post also (but note that you have to use nested notation not dot notation now, so [documents][rate_1m]).

system · September 6, 2018, 5:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What is the best throughput for logstash? Logstash	2	1865	July 6, 2017
Import (21gb) csv to elasticsearch Elasticsearch	9	534	February 1, 2019
Logstash parse too slow to elasticsearch Logstash	9	2270	March 2, 2018
Fastest way to ingest CSV's with logstash to elasticsearch Logstash	9	618	June 8, 2023
Logstash to ElasticSearch Throughput Logstash	6	1690	April 28, 2017

Performance issues while importing CSV files into Elasticsearch

Related topics