Bulk insert to Elastic from Logstash "bulk_path"

I have a huge csv file, like 500k rows, that I'm processing and want to push to elastic

This output:

elasticsearch {
        index => "%{pipelineId}"
        hosts => ["${domain:port}/"]
    }

outputs one document at a time! it takes like 11-14 mins.
Here is what I'm doing to do bulk but it takes the same time as above!

elasticsearch {
        index => "%{pipelineId}"
        hosts => ["${domain:port}/"]
        bulk_path => "${domain:port}/_bulk"
    }

any ideas why?

What makes you think the elasticsearch output is sending one document at a time?

@Badger I'm outputting on the terminal every time it sends to elasticsearch

    stdout { codec => rubydebug }
    file {
        path => "/usr/share/output/output.json"
        codec => "json_lines"
    }

and also the time, 11-13 mins for 500k json document pushed to ES

The elasticsearch output uses the bulk API to load data. I haven't checked the code, but I would expect it to load each pipeline batch (by default 125 events) as a single API call.

It would not surprise me if rubydebug is your rate-limiting component. How long does it take if you remove the non-elasticsearch outputs.

I just ran it, with 854430 docs pushed to ES, it took 9 minutes.

What is the size of your documents? What is the size and specification of your Elasticsearch cluster?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.