We've noticed recently that our snapshot durations are increasing over
time. Our rate of flow of data going into elasticsearch has remained fairly
constant. Though we do create new indices everyday ( though this is a fixed
number that doesn't vary from day to day ). We are currently snapshoting,
or trying to snapshot every hour. However with the snapshots taking a
progressively longer time this is proving difficult.
Here's some stats showing our time to finish:
Name Duration
( milli )
snapshot_2014-05-01_01:30:00 4497010 snapshot_2014-05-01_03:30:00
4513037 snapshot_2014-05-01_05:30:00 4770288 snapshot_2014-05-01_07:30:00
5413361 snapshot_2014-05-01_11:30:00 6978384 snapshot_2014-05-01_13:30:00
6907554 snapshot_2014-05-01_15:30:00 7388500
This is just the tail end originally the snapshots were only taking 7-8
mins to run, they've just been getting progressively longer.
I have a few questions. Are you still on S3? Which version of elasticsearch
are you using? How many snapshots do you currently keep in S3? How fast is
your index growing over time?
Igor
On Wednesday, May 7, 2014 6:58:05 AM UTC-4, Dipesh Patel wrote:
Hi
We've noticed recently that our snapshot durations are increasing over
time. Our rate of flow of data going into elasticsearch has remained fairly
constant. Though we do create new indices everyday ( though this is a fixed
number that doesn't vary from day to day ). We are currently snapshoting,
or trying to snapshot every hour. However with the snapshots taking a
progressively longer time this is proving difficult.
Here's some stats showing our time to finish:
Name
Duration ( milli )
snapshot_2014-05-01_01:30:00 4497010 snapshot_2014-05-01_03:30:00
4513037 snapshot_2014-05-01_05:30:00 4770288
snapshot_2014-05-01_07:30:00 5413361 snapshot_2014-05-01_11:30:00 6978384
snapshot_2014-05-01_13:30:00 6907554 snapshot_2014-05-01_15:30:00 7388500
This is just the tail end originally the snapshots were only taking 7-8
mins to run, they've just been getting progressively longer.
We are using elasticsearch 1.1.1.
Currently we are keeping all snapshots that we make in s3, we haven't yet
decided on an archive strategy/solution. So at the moment we have 131
snapshots in the s3 bucket.
So we have about 112 new indices a day.
I'll explain our set up a bit it may well be that we should be doing
something different. We are grabbing application logs from lots of
different apps and putting them into elasticsearch. We are using flume to
do this. So similar to logstash setup. The one major difference is that we
are creating an index for each application. So if we have 100 apps we will
create 100 indices each day one for each new app.
Dip
On Thursday, May 8, 2014 2:58:22 PM UTC+1, Igor Motov wrote:
Hi Dipesh,
I have a few questions. Are you still on S3? Which version of
elasticsearch are you using? How many snapshots do you currently keep in
S3? How fast is your index growing over time?
Igor
On Wednesday, May 7, 2014 6:58:05 AM UTC-4, Dipesh Patel wrote:
Hi
We've noticed recently that our snapshot durations are increasing over
time. Our rate of flow of data going into elasticsearch has remained fairly
constant. Though we do create new indices everyday ( though this is a fixed
number that doesn't vary from day to day ). We are currently snapshoting,
or trying to snapshot every hour. However with the snapshots taking a
progressively longer time this is proving difficult.
Here's some stats showing our time to finish:
Name
Duration ( milli )
snapshot_2014-05-01_01:30:00 4497010 snapshot_2014-05-01_03:30:00
4513037 snapshot_2014-05-01_05:30:00 4770288
snapshot_2014-05-01_07:30:00 5413361 snapshot_2014-05-01_11:30:00
6978384 snapshot_2014-05-01_13:30:00 6907554
snapshot_2014-05-01_15:30:00 7388500
This is just the tail end originally the snapshots were only taking 7-8
mins to run, they've just been getting progressively longer.
On Thursday, May 8, 2014 7:17:37 AM UTC-7, Dipesh Patel wrote:
Hi Igor
We are using elasticsearch 1.1.1.
Currently we are keeping all snapshots that we make in s3, we haven't yet
decided on an archive strategy/solution. So at the moment we have 131
snapshots in the s3 bucket.
So we have about 112 new indices a day.
I'll explain our set up a bit it may well be that we should be doing
something different. We are grabbing application logs from lots of
different apps and putting them into elasticsearch. We are using flume to
do this. So similar to logstash setup. The one major difference is that we
are creating an index for each application. So if we have 100 apps we will
create 100 indices each day one for each new app.
Dip
On Thursday, May 8, 2014 2:58:22 PM UTC+1, Igor Motov wrote:
Hi Dipesh,
I have a few questions. Are you still on S3? Which version of
elasticsearch are you using? How many snapshots do you currently keep in
S3? How fast is your index growing over time?
Igor
On Wednesday, May 7, 2014 6:58:05 AM UTC-4, Dipesh Patel wrote:
Hi
We've noticed recently that our snapshot durations are increasing over
time. Our rate of flow of data going into elasticsearch has remained fairly
constant. Though we do create new indices everyday ( though this is a fixed
number that doesn't vary from day to day ). We are currently snapshoting,
or trying to snapshot every hour. However with the snapshots taking a
progressively longer time this is proving difficult.
Here's some stats showing our time to finish:
Name
Duration ( milli )
snapshot_2014-05-01_01:30:00 4497010 snapshot_2014-05-01_03:30:00
4513037 snapshot_2014-05-01_05:30:00 4770288
snapshot_2014-05-01_07:30:00 5413361 snapshot_2014-05-01_11:30:00
6978384 snapshot_2014-05-01_13:30:00 6907554
snapshot_2014-05-01_15:30:00 7388500
This is just the tail end originally the snapshots were only taking 7-8
mins to run, they've just been getting progressively longer.
Snapshots are at the segment level. The more segments stored in the
repository, the more segments will have to be compared to those in each
successive snapshot. With merges taking place continually in an active
index, you may end up with a considerable number of "orphaned" segments
stored in your repository, i.e. segments "backed up," but no longer
directly correlating to a segment in your index. Checking through these
may be contributing to the increased amount of time between snapshots.
Consider pruning older snapshots. "Orphaned" segments will be deleted, and
any segments still referenced will be preserved.
--Aaron
On Thursday, November 13, 2014 2:04:42 AM UTC-5, Sally Ahn wrote:
On Thursday, May 8, 2014 7:17:37 AM UTC-7, Dipesh Patel wrote:
Hi Igor
We are using elasticsearch 1.1.1.
Currently we are keeping all snapshots that we make in s3, we haven't yet
decided on an archive strategy/solution. So at the moment we have 131
snapshots in the s3 bucket.
So we have about 112 new indices a day.
I'll explain our set up a bit it may well be that we should be doing
something different. We are grabbing application logs from lots of
different apps and putting them into elasticsearch. We are using flume to
do this. So similar to logstash setup. The one major difference is that we
are creating an index for each application. So if we have 100 apps we will
create 100 indices each day one for each new app.
Dip
On Thursday, May 8, 2014 2:58:22 PM UTC+1, Igor Motov wrote:
Hi Dipesh,
I have a few questions. Are you still on S3? Which version of
elasticsearch are you using? How many snapshots do you currently keep in
S3? How fast is your index growing over time?
Igor
On Wednesday, May 7, 2014 6:58:05 AM UTC-4, Dipesh Patel wrote:
Hi
We've noticed recently that our snapshot durations are increasing over
time. Our rate of flow of data going into elasticsearch has remained fairly
constant. Though we do create new indices everyday ( though this is a fixed
number that doesn't vary from day to day ). We are currently snapshoting,
or trying to snapshot every hour. However with the snapshots taking a
progressively longer time this is proving difficult.
Here's some stats showing our time to finish:
Name
Duration ( milli )
snapshot_2014-05-01_01:30:00 4497010 snapshot_2014-05-01_03:30:00
4513037 snapshot_2014-05-01_05:30:00 4770288
snapshot_2014-05-01_07:30:00 5413361 snapshot_2014-05-01_11:30:00
6978384 snapshot_2014-05-01_13:30:00 6907554
snapshot_2014-05-01_15:30:00 7388500
This is just the tail end originally the snapshots were only taking 7-8
mins to run, they've just been getting progressively longer.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.