Logstash parse too slow to elasticsearch

henriqueluz · February 1, 2018, 8:39pm

Hello there,
I'm trying to parse a large .csv file via Logstash to elasticsearch, but it is too slow, about 100 events per second, my .csv file has over a million events.

I'm running both Logstash and Elasticsearch locally, version 6.1.3.
My PC configuration:
-Ubuntu 16.04
-Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
-4GB DDR3 1600MHz

An example of some events:
|Date|Key|AFC|AGC|C|TN|Or|De|FCLS|Ftnt|OWRT|
2017-01-01,1BANYCLONAXXYD1T1,6424.16,1254.16,BA,1,aaa,bbb,AAA5D1T1,8P,RT
2017-01-01,1BANYCLONAXXY5D1T1,6424.16,1254.16,BA,1,aaa,bbb,AAA5D1T1,8P,RT
2017-01-02,1BANYCLONAXXY5D1T1,6424.16,1254.16,BA,1,aaa,bbb,AAA5D1T1,8P,RT
2017-01-02,1BANYCLONXXY5D1T1,6424.16,1254.16,BA,1,aaa,bbb,AAA5D1T1,8P,RT

The filter used:

input {
    file {
        path => "/home/tmp/file.csv"
        start_position => beginning

   }
}

filter {
    csv {
        columns => ["Date",
 		        "Key",
                    "AFC",
                    "AGC",
                    "C",
                    "TN",
                    "Or",
                    "De",
                    "FCLS",
                    "Ftnt",
                    "OWRT"
                           ]
        separator => ","
        remove_field => ["message"]
        }
    date {
        match => ["Date", "yyyy-MM-dd"]
    }
}

output {
 elasticsearch { hosts => ["localhost:9200"] 
                  index => "dev_index"
		}

  stdout { codec => dots }
}

I increased the workers number on logstash.yml to 4, but no did not appear to change anything.

How can I improve this performance?
Thanks,

magnusbaeck · February 1, 2018, 9:07pm

How do you know it's Logstash and not Elasticsearch that is the bottleneck?

henriqueluz · February 2, 2018, 11:17am

I actually don't know. Why would elastic be the bottleneck?

magnusbaeck · February 2, 2018, 12:08pm

Why not? I'd certainly expect ES to be able to cope with more than 100 eps on the kind of hardware you have, but I'd also expect Logstash to exceed 100 eps. Are you saturating the CPUs? Which process is dominating the CPU usage?

henriqueluz · February 2, 2018, 12:52pm

Sele%C3%A7%C3%A3o_128

CPU Usage while running logstash and elastic.

I increased the heap size and the vm.max_map_count, because I got a warning from elastic saying it was too low.

I'm getting a _dateparsefailure warning which I could not fix. May it be influencing the performance?

Checked the performance after the heap change and it did not improve.

magnusbaeck · February 2, 2018, 12:59pm

25.6% idle and 46.5% wait indicates that you have serious issues with I/O performance (probably on the ES side, which in turn bogs down Logstash). Is this a laptop? Does it have spinning disks or an SSD?

henriqueluz · February 2, 2018, 12:59pm

Laptop, spinning disks.

magnusbaeck · February 2, 2018, 1:24pm

I'm pretty sure that's the reason then. ES won't perform well if the I/O is slow.

henriqueluz · February 2, 2018, 1:42pm

I have access to a remote server that has more memory ans processing power and also has SSD, I'm going to perform some test there and see the results.

system · March 2, 2018, 1:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance issues while importing CSV files into Elasticsearch Logstash	2	773	September 6, 2018
How to speed up indexing of csv file via logstash Logstash	2	424	October 1, 2019
Fastest way to ingest CSV's with logstash to elasticsearch Logstash	9	615	June 8, 2023
Import (21gb) csv to elasticsearch Elasticsearch	9	533	February 1, 2019
Logstash is very slow in sending the data to elasticsearch Logstash	18	13917	February 21, 2017

Logstash parse too slow to elasticsearch

Related topics