Logstash parse too slow to elasticsearch

Hello there,
I'm trying to parse a large .csv file via Logstash to elasticsearch, but it is too slow, about 100 events per second, my .csv file has over a million events.

I'm running both Logstash and Elasticsearch locally, version 6.1.3.
My PC configuration:
-Ubuntu 16.04
-Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
-4GB DDR3 1600MHz

An example of some events:
|Date|Key|AFC|AGC|C|TN|Or|De|FCLS|Ftnt|OWRT|
2017-01-01,1BANYCLONAXXYD1T1,6424.16,1254.16,BA,1,aaa,bbb,AAA5D1T1,8P,RT
2017-01-01,1BANYCLONAXXY5D1T1,6424.16,1254.16,BA,1,aaa,bbb,AAA5D1T1,8P,RT
2017-01-02,1BANYCLONAXXY5D1T1,6424.16,1254.16,BA,1,aaa,bbb,AAA5D1T1,8P,RT
2017-01-02,1BANYCLONXXY5D1T1,6424.16,1254.16,BA,1,aaa,bbb,AAA5D1T1,8P,RT

The filter used:

input {
    file {
        path => "/home/tmp/file.csv"
        start_position => beginning

   }
}

filter {
    csv {
        columns => ["Date",
 		        "Key",
                    "AFC",
                    "AGC",
                    "C",
                    "TN",
                    "Or",
                    "De",
                    "FCLS",
                    "Ftnt",
                    "OWRT"
                           ]
        separator => ","
        remove_field => ["message"]
        }
    date {
        match => ["Date", "yyyy-MM-dd"]
    }
}

output {
 elasticsearch { hosts => ["localhost:9200"] 
                  index => "dev_index"
		}

  stdout { codec => dots }
}

I increased the workers number on logstash.yml to 4, but no did not appear to change anything.

How can I improve this performance?
Thanks,

How do you know it's Logstash and not Elasticsearch that is the bottleneck?

I actually don't know. Why would elastic be the bottleneck?

Why not? I'd certainly expect ES to be able to cope with more than 100 eps on the kind of hardware you have, but I'd also expect Logstash to exceed 100 eps. Are you saturating the CPUs? Which process is dominating the CPU usage?

Sele%C3%A7%C3%A3o_128

CPU Usage while running logstash and elastic.

I increased the heap size and the vm.max_map_count, because I got a warning from elastic saying it was too low.

I'm getting a _dateparsefailure warning which I could not fix. May it be influencing the performance?

Checked the performance after the heap change and it did not improve.

25.6% idle and 46.5% wait indicates that you have serious issues with I/O performance (probably on the ES side, which in turn bogs down Logstash). Is this a laptop? Does it have spinning disks or an SSD?

Laptop, spinning disks.

I'm pretty sure that's the reason then. ES won't perform well if the I/O is slow.

I have access to a remote server that has more memory ans processing power and also has SSD, I'm going to perform some test there and see the results.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.