I'm trying to index docs to elasticsearch from logstash with kafka input. I see the indexing takes a long time and i am looking to index around 3 million documents as fast as possible. My logstash conf is -
input {
kafka {
bootstrap_servers => "kafka2.dev:9092"
topics => ["READY_FOR_INDEX","INDEX_CSV"]
codec => json
consumer_threads => 8
}
}
output {
stdout { codec => rubydebug }
if [type] == "fact" or [type] == "dimension" {
elasticsearch {
index => "%{index}"
document_id => "%{id}"
hosts => "xyz.amazonaws.com:9200"
flush_size=>100000
}
}else {
elasticsearch {
index => "%{index}"
document_id => "%{id}"
hosts => "xyz.amazonaws.com:9200"
flush_size=>100000
}
}
}
My logstash is on aws machine with 32 gb RAM and 8 cores.
Right now just for 1000 documents it takes around 4 minutes. By that calculation it would take really long to index 3 million.
I was under the assumption that logstash did ES bulk indexing.
Please help.