Optimizing ES settings for bulkinserts

We are trying to insert raw-build log files inside ES. The scenario
involves transferring 20 files every 10 minutes via Flume and using
ES'es REST API for bulk inserts
Size of single log-line: 10kb
Avg. # of log-lines/second: 10K
Avg. size of ES REST API call for bulk-inserts: 4mb

Currently, it takes about 2 minutes for every REST API to return a
HTTP 200 reply. Other than adding more hardware, are there any config
settings to tweak around to speed up this process? Any idea how
logstash (http://code.google.com/p/logstash/) might be doing this?

-anurag

How do you index the files? Is it a single bulk request executed every 10
minutes with all the data? Would also be interesting to see where time is
spent, since most http libs out there are not amazing...

On Tue, Jan 4, 2011 at 10:31 PM, Anurag anurag.phadke@gmail.com wrote:

We are trying to insert raw-build log files inside ES. The scenario
involves transferring 20 files every 10 minutes via Flume and using
ES'es REST API for bulk inserts
Size of single log-line: 10kb
Avg. # of log-lines/second: 10K
Avg. size of ES REST API call for bulk-inserts: 4mb

Currently, it takes about 2 minutes for every REST API to return a
HTTP 200 reply. Other than adding more hardware, are there any config
settings to tweak around to speed up this process? Any idea how
logstash (Google Code Archive - Long-term storage for Google Code Project Hosting.) might be doing this?

-anurag

I added Bulk API: Add how long the bulk API took (in milliseconds) to the response · Issue #599 · elastic/elasticsearch · GitHub
includes how long the bulk execution took within ES. This will help in
the future to analyze if there is something "in the middle" that takes
longer.

On Wed, Jan 5, 2011 at 2:05 PM, Shay Banon shay.banon@elasticsearch.comwrote:

How do you index the files? Is it a single bulk request executed every 10
minutes with all the data? Would also be interesting to see where time is
spent, since most http libs out there are not amazing...

On Tue, Jan 4, 2011 at 10:31 PM, Anurag anurag.phadke@gmail.com wrote:

We are trying to insert raw-build log files inside ES. The scenario
involves transferring 20 files every 10 minutes via Flume and using
ES'es REST API for bulk inserts
Size of single log-line: 10kb
Avg. # of log-lines/second: 10K
Avg. size of ES REST API call for bulk-inserts: 4mb

Currently, it takes about 2 minutes for every REST API to return a
HTTP 200 reply. Other than adding more hardware, are there any config
settings to tweak around to speed up this process? Any idea how
logstash (Google Code Archive - Long-term storage for Google Code Project Hosting.) might be doing this?

-anurag