Logstash Huge data import + Fast import

Pierro · July 7, 2016, 7:31am

Hi,
I'm a newbie with ELK and I've got some questions, of course.

Let's try this one :
My config:
3 servers: 1 master with Elasticsearch and logstash, 2 nodes with Elasticsearch.
All are data store.
The Logstash output filter hosts the 2 nodes.

I'm doing data import from a CSV file read by Logstash to Elasticsearch, until now it's working.
But now I have to import a huge csv file with 56.000.000 items in one time.
Logstash seems OK and try to do the job but it seems that I'll be an old man when it'll be done.
I've tried ton divide the file, hoping Logstash threads import faster but it don't (strange ascertainment, the store size is bigger after multiple-files import than after a single file import, for the same amount of data).

So you see it coming : What can I do to faster the import ?

Thanks

magnusbaeck · July 7, 2016, 8:11am

I've tried ton divide the file, hoping Logstash threads import faster but it don't

Did you use a single file input listing multiple paths or multiple file inputs listing a single path each?

Pierro · July 7, 2016, 8:25am

I've shared items in multiple files (file1.csv, file2.csv, ...), put them in the same directory, and change the Logstath input filter like that :
input{
file {
path => "/opt/logstash-2.3.3/input/file*.csv"
...

magnusbaeck · July 7, 2016, 8:31am

I believe each file input will be processed in a single thread. To increase parallelism you have to split it into multiple file inputs.

Pierro · July 7, 2016, 8:38am

So I have to add multiple input filter ?
Something like that :
input{
file {
path => "/opt/logstash-2.3.3/input/file*.csv"
...
}
file {
path => "/opt/logstash-2.3.3/input/file*.csv"
...
}
file {
path => "/opt/logstash-2.3.3/input/file*.csv"
...
}
...

magnusbaeck · July 7, 2016, 8:41am

As I said, "multiple file inputs listing a single path each".

input {
  file {
    path => "file1"
  }
  file {
    path => "file2"
  }
  ...
}

Pierro · July 7, 2016, 8:44am

OK, got it !

Thanks for your help

Topic		Replies	Views
To give input to ELasticsearch via logstash Logstash	5	1262	July 6, 2017
Best method - Importing 50x10gb CSV files into Elasticsearch on GCE Elasticsearch	6	8943	July 6, 2017
What is the best throughput for logstash? Logstash	2	1880	July 6, 2017
Faster speed when indexing log files Logstash	11	5209	July 6, 2017
Improve performance of Logstash data loading into ES Logstash	16	4307	July 18, 2017

Logstash Huge data import + Fast import

Related topics