I have used elasticsearch for some months now, and I have a problem that I
don't really know how to solve.
I have an iPhone application that sends notifications. Regarding the number
of users we have, we may have to send +50k notifications per hour.
The notifications to send are stored on an index in elasticsearch, and once
the notification is sent, it is logged in another index. To send the
notifications, once one is created, it is put on a queue, and a worker gets
it and sends it.
The problem is that when the notifications are sent, the load-average of
the elastic-search instances becomes very high, and I don't stop getting
nagios notifications (sometimes, the load-average is > 8). And because we
use elastic-search for other parts of the app (the search, ...), it slows
it down.
In our architecture on AWS, we have 2 elastic-search instances behind a
load-balancer (ELB). The instances are m1.large (I started with m1.small,
then upgraded to c1.medium, then went to m1.large). Elasticsearch is given
6GB of memory (out of 7.2 for the instance). The indexes configuration are
the default one : 5 shards and 1 replica.
And because I can't afford to lose data, I set up the S3 "backup".
I am using elasticsearch 0.19.8
The main index size is about 30G. The logs index size is much smaller.
So, I'd like to know what is the problem here. Are the instances m1.large
not big enough for my usage? (That would be bad, because I just bought 2
reserved instances last month ...). Do I have to change something in the
configuration?
If you need more data on my configuration, feel free to ask!
I'll shoot a bit in the dark here and assume that you've allocated too much
memory for ES (usually 50% of the system RAM is a good starting point).
This leaves very little room for OS caches (of the 1.2GB left, some of that
will be used by OS). Indexing is CPU and I/O-intensive, so with little OS
caches, you probably hit the disks more often, causing I/O waits, which
might explain your load figures.
But that's just a shot in the dark. You can easily confirm/deny it by
lowering the amount of memory you allocate to ES, restart, and see if
anything changes.
To get more information, I'd suggest you use a monitoring solution to have
a deeper look at what's happening:
what's the bottleneck? is it really CPU? Or is it I/O, or too little
memory and a lot of garbage collection that's causing the load?
you can see whether you allocated too much memory to ES, or if you
actually need more, which implies you have to upgrade your instances
If you don't already have a preferred monitoring solution for ES, I'd
suggest you have a look at our SPM:
I think high load during indexing can be caused by one of the following:
you have too little memory allocated to ES, garbage cleaning eats a lot
of CPU, and CPU becomes the bottleneck
you have too much memory allocated to ES, there's too little OS cache to
help with I/O, there's too much stress on I/O causing high load
neither of the above is a problem, you simply need machines with higher
I/O throughput and/or CPU power
I have used elasticsearch for some months now, and I have a problem that I
don't really know how to solve.
I have an iPhone application that sends notifications. Regarding the
number of users we have, we may have to send +50k notifications per hour.
The notifications to send are stored on an index in elasticsearch, and
once the notification is sent, it is logged in another index. To send the
notifications, once one is created, it is put on a queue, and a worker gets
it and sends it.
The problem is that when the notifications are sent, the load-average of
the elastic-search instances becomes very high, and I don't stop getting
nagios notifications (sometimes, the load-average is > 8). And because we
use elastic-search for other parts of the app (the search, ...), it slows
it down.
In our architecture on AWS, we have 2 elastic-search instances behind a
load-balancer (ELB). The instances are m1.large (I started with m1.small,
then upgraded to c1.medium, then went to m1.large). Elasticsearch is given
6GB of memory (out of 7.2 for the instance). The indexes configuration are
the default one : 5 shards and 1 replica.
And because I can't afford to lose data, I set up the S3 "backup".
I am using elasticsearch 0.19.8
The main index size is about 30G. The logs index size is much smaller.
So, I'd like to know what is the problem here. Are the instances m1.large
not big enough for my usage? (That would be bad, because I just bought 2
reserved instances last month ...). Do I have to change something in the
configuration?
If you need more data on my configuration, feel free to ask!
In fact, I already changed up and down the memory allocated to ES. I
changed it down to 4GB to see if the problem continues.
I already use a monitoring solution for elasticsearch, which is bigdesk.
From what I saw with bigdesk, the bottleneck is the CPU, because it is at
the max during those peaks. I will post the screenshots this afternoon when
the problem happens (because unfortunately, I'm sure that it will).
Regards,
On Wednesday, March 27, 2013 2:36:12 PM UTC+1, Radu Gheorghe wrote:
Hello,
I'll shoot a bit in the dark here and assume that you've allocated too
much memory for ES (usually 50% of the system RAM is a good starting
point). This leaves very little room for OS caches (of the 1.2GB left, some
of that will be used by OS). Indexing is CPU and I/O-intensive, so with
little OS caches, you probably hit the disks more often, causing I/O waits,
which might explain your load figures.
But that's just a shot in the dark. You can easily confirm/deny it by
lowering the amount of memory you allocate to ES, restart, and see if
anything changes.
To get more information, I'd suggest you use a monitoring solution to have
a deeper look at what's happening:
what's the bottleneck? is it really CPU? Or is it I/O, or too little
memory and a lot of garbage collection that's causing the load?
you can see whether you allocated too much memory to ES, or if you
actually need more, which implies you have to upgrade your instances
If you don't already have a preferred monitoring solution for ES, I'd
suggest you have a look at our SPM: Elasticsearch Monitoring
I think high load during indexing can be caused by one of the following:
you have too little memory allocated to ES, garbage cleaning eats a lot
of CPU, and CPU becomes the bottleneck
you have too much memory allocated to ES, there's too little OS cache to
help with I/O, there's too much stress on I/O causing high load
neither of the above is a problem, you simply need machines with higher
I/O throughput and/or CPU power
On Wed, Mar 27, 2013 at 1:23 PM, Talal <mazrou...@gmail.com <javascript:>>wrote:
Hi everybody,
I have used elasticsearch for some months now, and I have a problem that
I don't really know how to solve.
I have an iPhone application that sends notifications. Regarding the
number of users we have, we may have to send +50k notifications per hour.
The notifications to send are stored on an index in elasticsearch, and
once the notification is sent, it is logged in another index. To send the
notifications, once one is created, it is put on a queue, and a worker gets
it and sends it.
The problem is that when the notifications are sent, the load-average of
the elastic-search instances becomes very high, and I don't stop getting
nagios notifications (sometimes, the load-average is > 8). And because we
use elastic-search for other parts of the app (the search, ...), it slows
it down.
In our architecture on AWS, we have 2 elastic-search instances behind a
load-balancer (ELB). The instances are m1.large (I started with m1.small,
then upgraded to c1.medium, then went to m1.large). Elasticsearch is given
6GB of memory (out of 7.2 for the instance). The indexes configuration are
the default one : 5 shards and 1 replica.
And because I can't afford to lose data, I set up the S3 "backup".
I am using elasticsearch 0.19.8
The main index size is about 30G. The logs index size is much smaller.
So, I'd like to know what is the problem here. Are the instances m1.large
not big enough for my usage? (That would be bad, because I just bought 2
reserved instances last month ...). Do I have to change something in the
configuration?
If you need more data on my configuration, feel free to ask!
Thanks in advance for your help,
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
Here is a screenshot during the problem.
Maybe there is something obvious here that needs optimization, but I can't
see what.
Thanks,
On Wednesday, March 27, 2013 2:57:43 PM UTC+1, Talal wrote:
Hi,
In fact, I already changed up and down the memory allocated to ES. I
changed it down to 4GB to see if the problem continues.
I already use a monitoring solution for elasticsearch, which is bigdesk.
From what I saw with bigdesk, the bottleneck is the CPU, because it is at
the max during those peaks. I will post the screenshots this afternoon when
the problem happens (because unfortunately, I'm sure that it will).
Regards,
On Wednesday, March 27, 2013 2:36:12 PM UTC+1, Radu Gheorghe wrote:
Hello,
I'll shoot a bit in the dark here and assume that you've allocated too
much memory for ES (usually 50% of the system RAM is a good starting
point). This leaves very little room for OS caches (of the 1.2GB left, some
of that will be used by OS). Indexing is CPU and I/O-intensive, so with
little OS caches, you probably hit the disks more often, causing I/O waits,
which might explain your load figures.
But that's just a shot in the dark. You can easily confirm/deny it by
lowering the amount of memory you allocate to ES, restart, and see if
anything changes.
To get more information, I'd suggest you use a monitoring solution to
have a deeper look at what's happening:
what's the bottleneck? is it really CPU? Or is it I/O, or too little
memory and a lot of garbage collection that's causing the load?
you can see whether you allocated too much memory to ES, or if you
actually need more, which implies you have to upgrade your instances
If you don't already have a preferred monitoring solution for ES, I'd
suggest you have a look at our SPM: Elasticsearch Monitoring
I think high load during indexing can be caused by one of the following:
you have too little memory allocated to ES, garbage cleaning eats a lot
of CPU, and CPU becomes the bottleneck
you have too much memory allocated to ES, there's too little OS cache
to help with I/O, there's too much stress on I/O causing high load
neither of the above is a problem, you simply need machines with higher
I/O throughput and/or CPU power
I have used elasticsearch for some months now, and I have a problem that
I don't really know how to solve.
I have an iPhone application that sends notifications. Regarding the
number of users we have, we may have to send +50k notifications per hour.
The notifications to send are stored on an index in elasticsearch, and
once the notification is sent, it is logged in another index. To send the
notifications, once one is created, it is put on a queue, and a worker gets
it and sends it.
The problem is that when the notifications are sent, the load-average of
the elastic-search instances becomes very high, and I don't stop getting
nagios notifications (sometimes, the load-average is > 8). And because we
use elastic-search for other parts of the app (the search, ...), it slows
it down.
In our architecture on AWS, we have 2 elastic-search instances behind a
load-balancer (ELB). The instances are m1.large (I started with m1.small,
then upgraded to c1.medium, then went to m1.large). Elasticsearch is given
6GB of memory (out of 7.2 for the instance). The indexes configuration are
the default one : 5 shards and 1 replica.
And because I can't afford to lose data, I set up the S3 "backup".
I am using elasticsearch 0.19.8
The main index size is about 30G. The logs index size is much smaller.
So, I'd like to know what is the problem here. Are the instances
m1.large not big enough for my usage? (That would be bad, because I just
bought 2 reserved instances last month ...). Do I have to change something
in the configuration?
If you need more data on my configuration, feel free to ask!
I don't see anything obvious. One question, though: does your
indexing&search performance drop under acceptable levels during that time?
Or is it just the alerts from Nagios that are bugging you? Because if it's
the latter, you can change the settings on Nagios.
Assuming it's not the case, there are a few things that might help:
increase index.refresh_interval
unfortunately, I can't tell if the high CPU usage is caused by I/O wait.
If it isn't, you probably need more/larger nodes. If it is, you can either
use faster storage, or try one of the following:
tune the merge
policyhttp://www.elasticsearch.org/guide/reference/index-modules/merge/for
more segments (eg: increase segments_per_tier). This will make your
searches slower, though. And it might actually be your searches causing the
load (I see lots more reads than writes). The scenario being that merges
invalidate caches, and it's expensive to rebuild those caches, as new
searches run.
Here is a screenshot during the problem.
Maybe there is something obvious here that needs optimization, but I can't
see what.
Thanks,
On Wednesday, March 27, 2013 2:57:43 PM UTC+1, Talal wrote:
Hi,
In fact, I already changed up and down the memory allocated to ES. I
changed it down to 4GB to see if the problem continues.
I already use a monitoring solution for elasticsearch, which is bigdesk.
From what I saw with bigdesk, the bottleneck is the CPU, because it is at
the max during those peaks. I will post the screenshots this afternoon when
the problem happens (because unfortunately, I'm sure that it will).
Regards,
On Wednesday, March 27, 2013 2:36:12 PM UTC+1, Radu Gheorghe wrote:
Hello,
I'll shoot a bit in the dark here and assume that you've allocated too
much memory for ES (usually 50% of the system RAM is a good starting
point). This leaves very little room for OS caches (of the 1.2GB left, some
of that will be used by OS). Indexing is CPU and I/O-intensive, so with
little OS caches, you probably hit the disks more often, causing I/O waits,
which might explain your load figures.
But that's just a shot in the dark. You can easily confirm/deny it by
lowering the amount of memory you allocate to ES, restart, and see if
anything changes.
To get more information, I'd suggest you use a monitoring solution to
have a deeper look at what's happening:
what's the bottleneck? is it really CPU? Or is it I/O, or too little
memory and a lot of garbage collection that's causing the load?
you can see whether you allocated too much memory to ES, or if you
actually need more, which implies you have to upgrade your instances
I have used elasticsearch for some months now, and I have a problem
that I don't really know how to solve.
I have an iPhone application that sends notifications. Regarding the
number of users we have, we may have to send +50k notifications per hour.
The notifications to send are stored on an index in elasticsearch, and
once the notification is sent, it is logged in another index. To send the
notifications, once one is created, it is put on a queue, and a worker gets
it and sends it.
The problem is that when the notifications are sent, the load-average
of the elastic-search instances becomes very high, and I don't stop getting
nagios notifications (sometimes, the load-average is > 8). And because we
use elastic-search for other parts of the app (the search, ...), it slows
it down.
In our architecture on AWS, we have 2 elastic-search instances behind a
load-balancer (ELB). The instances are m1.large (I started with m1.small,
then upgraded to c1.medium, then went to m1.large). Elasticsearch is given
6GB of memory (out of 7.2 for the instance). The indexes configuration are
the default one : 5 shards and 1 replica.
And because I can't afford to lose data, I set up the S3 "backup".
I am using elasticsearch 0.19.8
The main index size is about 30G. The logs index size is much smaller.
So, I'd like to know what is the problem here. Are the instances
m1.large not big enough for my usage? (That would be bad, because I just
bought 2 reserved instances last month ...). Do I have to change something
in the configuration?
If you need more data on my configuration, feel free to ask!
Sorry for hijacking the thread but I've a question.
I see two options: index.translog.flush_threshold_ops and
index.translog.flush_threshold_size.
I'm not sure how do these settings co-exist! Is it like whichever comes first?
I've a setup where docs are added at rate of 9k per second and seeing
the default, it seems that my setup flushes every second. I want to
optimize it for maximum indexing speed per second. What value should I
set? Also, the system is low on memory so does increasing these
results have a significant effect on memory usage?
Sorry for hijacking the thread but I've a question.
I see two options: index.translog.flush_threshold_ops and
index.translog.flush_threshold_size.
I'm not sure how do these settings co-exist! Is it like whichever comes first?
Yes. You could have just 5 ops, but each op is indexing a 100MB
document, in which case size would trigger before ops.
I've a setup where docs are added at rate of 9k per second and seeing
the default, it seems that my setup flushes every second. I want to
optimize it for maximum indexing speed per second. What value should I
set? Also, the system is low on memory so does increasing these
results have a significant effect on memory usage?
I don't think it will affect memory usage, but I'm not absolutely sure.
Presumably your 9k docs are all small? You probably want to increase
the ops value
Sorry for hijacking the thread but I've a question.
I see two options: index.translog.flush_threshold_ops and
index.translog.flush_threshold_size.
I'm not sure how do these settings co-exist! Is it like whichever comes first?
Yes. You could have just 5 ops, but each op is indexing a 100MB
document, in which case size would trigger before ops.
I've a setup where docs are added at rate of 9k per second and seeing
the default, it seems that my setup flushes every second. I want to
optimize it for maximum indexing speed per second. What value should I
set? Also, the system is low on memory so does increasing these
results have a significant effect on memory usage?
I don't think it will affect memory usage, but I'm not absolutely sure.
Presumably your 9k docs are all small? You probably want to increase
the ops value
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.