Hi,
I'm a newbie with ELK and I've got some questions, of course.
Let's try this one :
My config:
3 servers: 1 master with Elasticsearch and logstash, 2 nodes with Elasticsearch.
All are data store.
The Logstash output filter hosts the 2 nodes.
I'm doing data import from a CSV file read by Logstash to Elasticsearch, until now it's working.
But now I have to import a huge csv file with 56.000.000 items in one time.
Logstash seems OK and try to do the job but it seems that I'll be an old man when it'll be done.
I've tried ton divide the file, hoping Logstash threads import faster but it don't (strange ascertainment, the store size is bigger after multiple-files import than after a single file import, for the same amount of data).
So you see it coming : What can I do to faster the import ?
Thanks
I've tried ton divide the file, hoping Logstash threads import faster but it don't
Did you use a single file input listing multiple paths or multiple file inputs listing a single path each?
I've shared items in multiple files (file1.csv, file2.csv, ...), put them in the same directory, and change the Logstath input filter like that :
input{
file {
path => "/opt/logstash-2.3.3/input/file*.csv"
...
I believe each file input will be processed in a single thread. To increase parallelism you have to split it into multiple file inputs.
So I have to add multiple input filter ?
Something like that :
input{
file {
path => "/opt/logstash-2.3.3/input/file*.csv"
...
}
file {
path => "/opt/logstash-2.3.3/input/file*.csv"
...
}
file {
path => "/opt/logstash-2.3.3/input/file*.csv"
...
}
...
As I said, "multiple file inputs listing a single path each".
input {
file {
path => "file1"
}
file {
path => "file2"
}
...
}