We are trying to insert raw-build log files inside ES. The scenario
involves transferring 20 files every 10 minutes via Flume and using
ES'es REST API for bulk inserts
Size of single log-line: 10kb
Avg. # of log-lines/second: 10K
Avg. size of ES REST API call for bulk-inserts: 4mb
Currently, it takes about 2 minutes for every REST API to return a
HTTP 200 reply. Other than adding more hardware, are there any config
settings to tweak around to speed up this process? Any idea how
logstash (http://code.google.com/p/logstash/) might be doing this?
How do you index the files? Is it a single bulk request executed every 10
minutes with all the data? Would also be interesting to see where time is
spent, since most http libs out there are not amazing...
We are trying to insert raw-build log files inside ES. The scenario
involves transferring 20 files every 10 minutes via Flume and using
ES'es REST API for bulk inserts
Size of single log-line: 10kb
Avg. # of log-lines/second: 10K
Avg. size of ES REST API call for bulk-inserts: 4mb
Currently, it takes about 2 minutes for every REST API to return a
HTTP 200 reply. Other than adding more hardware, are there any config
settings to tweak around to speed up this process? Any idea how
logstash (Google Code Archive - Long-term storage for Google Code Project Hosting.) might be doing this?
How do you index the files? Is it a single bulk request executed every 10
minutes with all the data? Would also be interesting to see where time is
spent, since most http libs out there are not amazing...
We are trying to insert raw-build log files inside ES. The scenario
involves transferring 20 files every 10 minutes via Flume and using
ES'es REST API for bulk inserts
Size of single log-line: 10kb
Avg. # of log-lines/second: 10K
Avg. size of ES REST API call for bulk-inserts: 4mb
Currently, it takes about 2 minutes for every REST API to return a
HTTP 200 reply. Other than adding more hardware, are there any config
settings to tweak around to speed up this process? Any idea how
logstash (Google Code Archive - Long-term storage for Google Code Project Hosting.) might be doing this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.