Elasticsearch JVM options

Hello!

I'm trying to increase elasticsearch indexing performance for logstash.
Here are my tech specs:

three servers with:
each with two E5520 CPU and 24Gb RAM, RAID10 (4HDD)

Here are java opts:
ES_HEAP_SIZE=12g
ES_JAVA_OPTS="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=70 -XX:-UseGCOverheadLimit
-XX:NewSize=256m -XX:InitialTenuringThreshold=10
-Djava.net.preferIPv4Stack=true -Dnetworkaddress.cache.ttl=7200
-Dnetworkaddress.cache.negative.ttl=2"

Here are configs. master:

cluster.name: "logstash"
node.name: "search-1"
node.master: true
node.data: true
index.number_of_shards: 3
bootstrap.mlockall: true
indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

threadpool.index.type: fixed
threadpool.index.size: 60
threadpool.index.queue_size: 200

slave:
cluster.name: "logstash"
node.name: "search-2"
node.master: false
node.data: true
index.number_of_shards: 3
bootstrap.mlockall: true
indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

threadpool.index.type: fixed
threadpool.index.size: 60
threadpool.index.queue_size: 200

The maximum amount of logs I reached are 3000 logs per second and lots of
gaps approximately 30-60 seconds (garbage collector I guess).

Could anyone help me to reach at least 10000 logs per second? Each log is
approximately 170 bytes.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The first ideas that come to my mind are

  • Your indices.memory.index_buffer_size looks high, did you make sure it
    makes indexing faster than the default setting (10%)?
  • If you can afford SSD disks, it can definitely help,
  • Analysis can take a non-negligible part of the indexing time, in
    particular with complex analyzers such as the standard one. If simpler
    analyzers work for you as well, they could make indexing faster.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

You have 6 cores, but only 3 shards. Try 6-7 shards.

What is your bottleneck, do you know?
As for the JVM (options) - use the latest Oracle 7 and try G1 for
GC: http://search-lucene.com/?q=G1&sort=pure&fc_project=ElasticSearch

Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

On Thursday, June 13, 2013 10:14:27 AM UTC-4, kay kay wrote:

Hello!

I'm trying to increase elasticsearch indexing performance for logstash.
Here are my tech specs:

three servers with:
each with two E5520 CPU and 24Gb RAM, RAID10 (4HDD)

Here are java opts:
ES_HEAP_SIZE=12g
ES_JAVA_OPTS="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=70 -XX:-UseGCOverheadLimit
-XX:NewSize=256m -XX:InitialTenuringThreshold=10
-Djava.net.preferIPv4Stack=true -Dnetworkaddress.cache.ttl=7200
-Dnetworkaddress.cache.negative.ttl=2"

Here are configs. master:

cluster.name: "logstash"
node.name: "search-1"
node.master: true
node.data: true
index.number_of_shards: 3
bootstrap.mlockall: true
indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

threadpool.index.type: fixed
threadpool.index.size: 60
threadpool.index.queue_size: 200

slave:
cluster.name: "logstash"
node.name: "search-2"
node.master: false
node.data: true
index.number_of_shards: 3
bootstrap.mlockall: true
indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

threadpool.index.type: fixed
threadpool.index.size: 60
threadpool.index.queue_size: 200

The maximum amount of logs I reached are 3000 logs per second and lots of
gaps approximately 30-60 seconds (garbage collector I guess).

Could anyone help me to reach at least 10000 logs per second? Each log is
approximately 170 bytes.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

Oh, and you don't mention if you are using bulk indexing. You should!

Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

On Thursday, June 13, 2013 10:14:27 AM UTC-4, kay kay wrote:

Hello!

I'm trying to increase elasticsearch indexing performance for logstash.
Here are my tech specs:

three servers with:
each with two E5520 CPU and 24Gb RAM, RAID10 (4HDD)

Here are java opts:
ES_HEAP_SIZE=12g
ES_JAVA_OPTS="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=70 -XX:-UseGCOverheadLimit
-XX:NewSize=256m -XX:InitialTenuringThreshold=10
-Djava.net.preferIPv4Stack=true -Dnetworkaddress.cache.ttl=7200
-Dnetworkaddress.cache.negative.ttl=2"

Here are configs. master:

cluster.name: "logstash"
node.name: "search-1"
node.master: true
node.data: true
index.number_of_shards: 3
bootstrap.mlockall: true
indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

threadpool.index.type: fixed
threadpool.index.size: 60
threadpool.index.queue_size: 200

slave:
cluster.name: "logstash"
node.name: "search-2"
node.master: false
node.data: true
index.number_of_shards: 3
bootstrap.mlockall: true
indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

threadpool.index.type: fixed
threadpool.index.size: 60
threadpool.index.queue_size: 200

The maximum amount of logs I reached are 3000 logs per second and lots of
gaps approximately 30-60 seconds (garbage collector I guess).

Could anyone help me to reach at least 10000 logs per second? Each log is
approximately 170 bytes.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for replies!

I've increased the logstash workers to 10, set shards to 7, updated jdk
from 6 to 7u21 and now I get ~4000 logs per second.

  • Your indices.memory.index_buffer_size looks high, did you make sure it
    makes indexing faster than the default setting (10%)?

I've tuned it following this article
(http://jablonskis.org/2013/elasticsearch-and-logstash-tuning/) which says:

ES by default assumes that you’re going to use it mostly for searching
and querying, so it allocates 90% of its allocated total HEAP
memory for searching, but my case was opposite – the goal is to index
vast amounts of logs as quickly as possible, so I changed that to 50/50.

  • If you can afford SSD disks, it can definitely help,

Disk write speed is not more than 10mb/sec.

Oh, and you don't mention if you are using bulk indexing. You should!

How and where should I enable it? I've found flush_size option in logstash
elasticsearch_http module, but this module is beta yet.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

On Fri, Jun 14, 2013 at 10:51 AM, kay kay kay.diam@gmail.com wrote:

Thanks for replies!

I've increased the logstash workers to 10, set shards to 7, updated jdk
from 6 to 7u21 and now I get ~4000 logs per second.

  • Your indices.memory.index_buffer_size looks high, did you make sure
    it makes indexing faster than the default setting (10%)?

I've tuned it following this article (
http://jablonskis.org/2013/elasticsearch-and-logstash-tuning/) which says:

ES by default assumes that you’re going to use it mostly for searching
and querying, so it allocates 90% of its allocated total HEAP
memory for searching, but my case was opposite – the goal is to index
vast amounts of logs as quickly as possible, so I changed that to 50/50.

  • If you can afford SSD disks, it can definitely help,

Disk write speed is not more than 10mb/sec.

Oh, and you don't mention if you are using bulk indexing. You should!

How and where should I enable it? I've found flush_size option in logstash
elasticsearch_http module, but this module is beta yet.

Yes, elasticsearch_http might give you better results, as you can tune the
flush_size.

It is beta, but it's there for a long time and used by many. I guess it's
up to you to test if it works well for your usecase and report bugs if not.

Talking about testing: I think the key to tuning your performance is
changing settings, trying again to see if performance differs, and doing
all that while monitoring your cluster. Hence Adrien's question on whether
you're sure the 50% index buffer size isn't too much. As Otis suggested,
you also need to know what's your bottleneck - you can check our
SPMhttp://sematext.com/spm/elasticsearch-performance-monitoring/for
monitoring. It's probably either CPU or I/O.

I assume increasing your number of shards should help (because it implies
more segments - you might also try tuning your merge
policyhttp://www.elasticsearch.org/guide/reference/index-modules/merge/).
Also, try increasing the refresh_interval from the default 1 second. But
using bulks would probably give you the biggest gain.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Unfortunately bulk indexing didn't give any improvements. There are still
~4000 logs per second.
Also I've tried to play with indices.memory.index_buffer_size - it doesn't
matter at all.

I'll try to tune merge policy as well and let you know the results.

As for the SMP monitoring, is there any open source equivalent?

пятница, 14 июня 2013 г., 16:04:29 UTC+4 пользователь Radu Gheorghe написал:

Hi,

On Fri, Jun 14, 2013 at 10:51 AM, kay kay <kay....@gmail.com <javascript:>

wrote:

Thanks for replies!

I've increased the logstash workers to 10, set shards to 7, updated jdk
from 6 to 7u21 and now I get ~4000 logs per second.

  • Your indices.memory.index_buffer_size looks high, did you make sure
    it makes indexing faster than the default setting (10%)?

I've tuned it following this article (
http://jablonskis.org/2013/elasticsearch-and-logstash-tuning/) which
says:

ES by default assumes that you're going to use it mostly for searching
and querying, so it allocates 90% of its allocated total HEAP
memory for searching, but my case was opposite - the goal is to index
vast amounts of logs as quickly as possible, so I changed that to 50/50.

  • If you can afford SSD disks, it can definitely help,

Disk write speed is not more than 10mb/sec.

Oh, and you don't mention if you are using bulk indexing. You should!

How and where should I enable it? I've found flush_size option in
logstash elasticsearch_http module, but this module is beta yet.

Yes, elasticsearch_http might give you better results, as you can tune the
flush_size.

It is beta, but it's there for a long time and used by many. I guess it's
up to you to test if it works well for your usecase and report bugs if not.

Talking about testing: I think the key to tuning your performance is
changing settings, trying again to see if performance differs, and doing
all that while monitoring your cluster. Hence Adrien's question on whether
you're sure the 50% index buffer size isn't too much. As Otis suggested,
you also need to know what's your bottleneck - you can check our SPMhttp://sematext.com/spm/elasticsearch-performance-monitoring/for monitoring. It's probably either CPU or I/O.

I assume increasing your number of shards should help (because it implies
more segments - you might also try tuning your merge policyhttp://www.elasticsearch.org/guide/reference/index-modules/merge/).
Also, try increasing the refresh_interval from the default 1 second. But
using bulks would probably give you the biggest gain.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Fri, Jun 14, 2013 at 5:19 PM, kay kay kay.diam@gmail.com wrote:

Unfortunately bulk indexing didn't give any improvements. There are still
~4000 logs per second.
Also I've tried to play with indices.memory.index_buffer_size - it
doesn't matter at all.

I'll try to tune merge policy as well and let you know the results.

As for the SMP monitoring, is there any open source equivalent?

It's not really equivalent, but you can try BigDesk:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Great plugin, thanks!

воскресенье, 16 июня 2013 г., 9:01:41 UTC+4 пользователь Radu Gheorghe
написал:

On Fri, Jun 14, 2013 at 5:19 PM, kay kay <kay....@gmail.com <javascript:>>wrote:

Unfortunately bulk indexing didn't give any improvements. There are still
~4000 logs per second.
Also I've tried to play with indices.memory.index_buffer_size - it
doesn't matter at all.

I'll try to tune merge policy as well and let you know the results.

As for the SMP monitoring, is there any open source equivalent?

It's not really equivalent, but you can try BigDesk:
https://github.com/lukas-vlcek/bigdesk

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.