Logstash/Elasticsearch Slow CSV Import

guardianmoon · March 6, 2015, 7:30pm

I'm testing out the ELK stack on my desktop (ie 1 node) and thought I'd
start by pulling a flat file, having logstash parse and output it to
Elasticsearch. The setup was easy, but working through the flat file is
painfully slow. The flat file is tab delimited, about 6million rows and 10
fields. I've messed around with the refresh_interval, flush_size, and
workers, but the most I've been able to get is about 300 documents a
second, which means 5-6hours. I'm having a hard time believing that that's
right.

In addition to this, logstash stops reading in the file at 579,242
documents every single time (about an hour in), but throws no errors.

If I pull the index field out or the mapping template out (which is mostly
specifying integers, dates and non-analyzed fields), then I start getting
4-6k documents loading per second.

Any guesses as to what I'm doing wrong?

If it's relevant, my desktop is set at 10gb (with a 4gb heap setting for
ES) and 4 cores.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/043d9573-c07d-49f9-9410-9cb1424b2b78%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · March 6, 2015, 8:50pm

This may be worth taking to
Redirecting to Google Groups, but can
you show us your Logstash config?

On 7 March 2015 at 06:30, Econgineer wrote:

I'm testing out the ELK stack on my desktop (ie 1 node) and thought I'd
start by pulling a flat file, having logstash parse and output it to
Elasticsearch. The setup was easy, but working through the flat file is
painfully slow. The flat file is tab delimited, about 6million rows and 10
fields. I've messed around with the refresh_interval, flush_size, and
workers, but the most I've been able to get is about 300 documents a
second, which means 5-6hours. I'm having a hard time believing that that's
right.

In addition to this, logstash stops reading in the file at 579,242
documents every single time (about an hour in), but throws no errors.

If I pull the index field out or the mapping template out (which is mostly
specifying integers, dates and non-analyzed fields), then I start getting
4-6k documents loading per second.

Any guesses as to what I'm doing wrong?

If it's relevant, my desktop is set at 10gb (with a 4gb heap setting for
ES) and 4 cores.

cdahlqvist · March 8, 2015, 4:27pm

Hi,

Can you please share you logstash configuration, some sample data as well
as your mappings?

Best regards,

Christian

On Friday, March 6, 2015 at 11:30:45 AM UTC-8, Econgineer wrote:

I'm testing out the ELK stack on my desktop (ie 1 node) and thought I'd
start by pulling a flat file, having logstash parse and output it to
Elasticsearch. The setup was easy, but working through the flat file is
painfully slow. The flat file is tab delimited, about 6million rows and 10
fields. I've messed around with the refresh_interval, flush_size, and
workers, but the most I've been able to get is about 300 documents a
second, which means 5-6hours. I'm having a hard time believing that that's
right.

In addition to this, logstash stops reading in the file at 579,242
documents every single time (about an hour in), but throws no errors.

If I pull the index field out or the mapping template out (which is mostly
specifying integers, dates and non-analyzed fields), then I start getting
4-6k documents loading per second.

Any guesses as to what I'm doing wrong?

If it's relevant, my desktop is set at 10gb (with a 4gb heap setting for
ES) and 4 cores.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e3bfebb-cbb1-4500-a1fa-3c784cf42cb4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

guardianmoon · March 8, 2015, 4:30pm

Turns out I just had the wrong character encoding set. Everythings working
great at 2-3k documents a second now!

Thanks!

On Friday, March 6, 2015 at 11:30:45 AM UTC-8, Econgineer wrote:

I'm testing out the ELK stack on my desktop (ie 1 node) and thought I'd
start by pulling a flat file, having logstash parse and output it to
Elasticsearch. The setup was easy, but working through the flat file is
painfully slow. The flat file is tab delimited, about 6million rows and 10
fields. I've messed around with the refresh_interval, flush_size, and
workers, but the most I've been able to get is about 300 documents a
second, which means 5-6hours. I'm having a hard time believing that that's
right.

In addition to this, logstash stops reading in the file at 579,242
documents every single time (about an hour in), but throws no errors.

If I pull the index field out or the mapping template out (which is mostly
specifying integers, dates and non-analyzed fields), then I start getting
4-6k documents loading per second.

Any guesses as to what I'm doing wrong?

If it's relevant, my desktop is set at 10gb (with a 4gb heap setting for
ES) and 4 cores.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a3d4a986-56d9-4080-93d0-1bc17eb880be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

yehosef · June 8, 2016, 8:57am

Can you share the config values before and after to help others with a similar problem?

Topic		Replies	Views
Logstash stops loading text file into ES Logstash	2	644	July 6, 2017
Extremly slow troughput on large index Elasticsearch	8	1005	July 6, 2017
Import csv file to elasticsearch using logstash Logstash	13	2655	May 24, 2017
Slow Data loading to elasticsearch Logstash	15	5290	July 13, 2017
Best method - Importing 50x10gb CSV files into Elasticsearch on GCE Elasticsearch	6	8938	July 6, 2017

Logstash/Elasticsearch Slow CSV Import

Related topics