Your indices.memory.index_buffer_size looks high, did you make sure it
makes indexing faster than the default setting (10%)?
If you can afford SSD disks, it can definitely help,
Analysis can take a non-negligible part of the indexing time, in
particular with complex analyzers such as the standard one. If simpler
analyzers work for you as well, they could make indexing faster.
ES by default assumes that you’re going to use it mostly for searching
and querying, so it allocates 90% of its allocated total HEAP
memory for searching, but my case was opposite – the goal is to index
vast amounts of logs as quickly as possible, so I changed that to 50/50.
If you can afford SSD disks, it can definitely help,
Disk write speed is not more than 10mb/sec.
Oh, and you don't mention if you are using bulk indexing. You should!
How and where should I enable it? I've found flush_size option in logstash
elasticsearch_http module, but this module is beta yet.
ES by default assumes that you’re going to use it mostly for searching
and querying, so it allocates 90% of its allocated total HEAP
memory for searching, but my case was opposite – the goal is to index
vast amounts of logs as quickly as possible, so I changed that to 50/50.
If you can afford SSD disks, it can definitely help,
Disk write speed is not more than 10mb/sec.
Oh, and you don't mention if you are using bulk indexing. You should!
How and where should I enable it? I've found flush_size option in logstash
elasticsearch_http module, but this module is beta yet.
Yes, elasticsearch_http might give you better results, as you can tune the
flush_size.
It is beta, but it's there for a long time and used by many. I guess it's
up to you to test if it works well for your usecase and report bugs if not.
Talking about testing: I think the key to tuning your performance is
changing settings, trying again to see if performance differs, and doing
all that while monitoring your cluster. Hence Adrien's question on whether
you're sure the 50% index buffer size isn't too much. As Otis suggested,
you also need to know what's your bottleneck - you can check our
SPMhttp://sematext.com/spm/elasticsearch-performance-monitoring/for
monitoring. It's probably either CPU or I/O.
I assume increasing your number of shards should help (because it implies
more segments - you might also try tuning your merge
policyhttp://www.elasticsearch.org/guide/reference/index-modules/merge/).
Also, try increasing the refresh_interval from the default 1 second. But
using bulks would probably give you the biggest gain.
Unfortunately bulk indexing didn't give any improvements. There are still
~4000 logs per second.
Also I've tried to play with indices.memory.index_buffer_size - it doesn't
matter at all.
I'll try to tune merge policy as well and let you know the results.
As for the SMP monitoring, is there any open source equivalent?
пятница, 14 июня 2013 г., 16:04:29 UTC+4 пользователь Radu Gheorghe написал:
Hi,
On Fri, Jun 14, 2013 at 10:51 AM, kay kay <kay....@gmail.com <javascript:>
wrote:
Thanks for replies!
I've increased the logstash workers to 10, set shards to 7, updated jdk
from 6 to 7u21 and now I get ~4000 logs per second.
Your indices.memory.index_buffer_size looks high, did you make sure
it makes indexing faster than the default setting (10%)?
ES by default assumes that you're going to use it mostly for searching
and querying, so it allocates 90% of its allocated total HEAP
memory for searching, but my case was opposite - the goal is to index
vast amounts of logs as quickly as possible, so I changed that to 50/50.
If you can afford SSD disks, it can definitely help,
Disk write speed is not more than 10mb/sec.
Oh, and you don't mention if you are using bulk indexing. You should!
How and where should I enable it? I've found flush_size option in
logstash elasticsearch_http module, but this module is beta yet.
Yes, elasticsearch_http might give you better results, as you can tune the
flush_size.
It is beta, but it's there for a long time and used by many. I guess it's
up to you to test if it works well for your usecase and report bugs if not.
Talking about testing: I think the key to tuning your performance is
changing settings, trying again to see if performance differs, and doing
all that while monitoring your cluster. Hence Adrien's question on whether
you're sure the 50% index buffer size isn't too much. As Otis suggested,
you also need to know what's your bottleneck - you can check our SPMhttp://sematext.com/spm/elasticsearch-performance-monitoring/for monitoring. It's probably either CPU or I/O.
I assume increasing your number of shards should help (because it implies
more segments - you might also try tuning your merge policyhttp://www.elasticsearch.org/guide/reference/index-modules/merge/).
Also, try increasing the refresh_interval from the default 1 second. But
using bulks would probably give you the biggest gain.
Unfortunately bulk indexing didn't give any improvements. There are still
~4000 logs per second.
Also I've tried to play with indices.memory.index_buffer_size - it
doesn't matter at all.
I'll try to tune merge policy as well and let you know the results.
As for the SMP monitoring, is there any open source equivalent?
It's not really equivalent, but you can try BigDesk:
воскресенье, 16 июня 2013 г., 9:01:41 UTC+4 пользователь Radu Gheorghe
написал:
On Fri, Jun 14, 2013 at 5:19 PM, kay kay <kay....@gmail.com <javascript:>>wrote:
Unfortunately bulk indexing didn't give any improvements. There are still
~4000 logs per second.
Also I've tried to play with indices.memory.index_buffer_size - it
doesn't matter at all.
I'll try to tune merge policy as well and let you know the results.
As for the SMP monitoring, is there any open source equivalent?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.