Best way to take advantage of resources


(orenmazor) #1

hey guys,

I have a 40m doc index across two fairly hefty machines (32gb ram, 4core).
I do around 100 indexes a second and maybe 30 deletes a second, all in
bulk.

my settings are as follows:
"index.number_of_replicas": "0",

"index.number_of_shards": "10",
"index.merge.policy.segments_per_tier": "30",
"index.refresh_interval": "3s",
"index.merge.policy.max_merge_at_once": "10"

I find I still see delays whenever my indexing spikes to around 500-1000/second, and I'm wondering what I can do take better use of these two nodes. the load on them is almost negligible at pretty much all times.

one thing I was thinking of doing is starting one or two more nodes on the same machine, and move some of those shards across (I use routing) to maybe help out with indexing that way.

thoughts?


(Radu Gheorghe) #2

Hi,

Maybe it would help if you increase the memory allocated to ES. Take a
look here for details:

http://www.elasticsearch.org/guide/reference/setup/installation.html

On 6 mar., 02:59, Oren Mazor o...@wildbit.com wrote:

hey guys,

I have a 40m doc index across two fairly hefty machines (32gb ram, 4core).
I do around 100 indexes a second and maybe 30 deletes a second, all in
bulk.

my settings are as follows:
"index.number_of_replicas": "0",

"index.number_of_shards": "10",
"index.merge.policy.segments_per_tier": "30",
"index.refresh_interval": "3s",
"index.merge.policy.max_merge_at_once": "10"

I find I still see delays whenever my indexing spikes to around 500-1000/second, and I'm wondering what I can do take better use of these two nodes. the load on them is almost negligible at pretty much all times.

one thing I was thinking of doing is starting one or two more nodes on the same machine, and move some of those shards across (I use routing) to maybe help out with indexing that way.

thoughts?


(Otis Gospodnetić) #3

Hi,

When your indexing pauses, do you know how your JVM (the one ES is in) is
behaving in terms of GCing? That's one thing to check. I believe there
are a couple of settings that control flushing. Of course, you'll also
want to check that it's not the source of your data that is pausing, and
not ES.

Increasing index.refresh_interval will help, too.

You'll also want to look at your 2 ES nodes and see if it's the CPU or disk
IO or network IO or JVM heap that's the bottleneck and based on that you
will know what options may have a positive effect.

We haven't announced SPM for ElasticSearch just yet and have not polished
it completely, but you are welcome to use our SPM for ElasticSearch
tool/service (it's free) that will expose a number of performance metrics
from ES and from the underlying JVM and the server itself, so you can more
easily tell what's going on with your ES cluster. SPM for ES is hiding at
http://apps.sematext.com/ .

Otis

Hiring ElasticSearch Engineers World-Wide --

On Tuesday, March 6, 2012 8:59:23 AM UTC+8, Oren Mazor wrote:

hey guys,

I have a 40m doc index across two fairly hefty machines (32gb ram, 4core).
I do around 100 indexes a second and maybe 30 deletes a second, all in
bulk.

my settings are as follows:
"index.number_of_replicas": "0",

"index.number_of_shards": "10",
"index.merge.policy.segments_per_tier": "30",
"index.refresh_interval": "3s",
"index.merge.policy.max_merge_at_once": "10"

I find I still see delays whenever my indexing spikes to around 500-1000/second, and I'm wondering what I can do take better use of these two nodes. the load on them is almost negligible at pretty much all times.

one thing I was thinking of doing is starting one or two more nodes on the same machine, and move some of those shards across (I use routing) to maybe help out with indexing that way.

thoughts?


(system) #4