Very high disk IO while indexing

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with 2
CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3 million
documents and about 8GB on disk. The refresh internal is set at 5 minutes
while the reindex is going, the index is 5 shards and 1 replica. I haven't
changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO from
98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Here are the index
_settings: https://gist.github.com/brupm/d7fe657a9501e617d46c

On Monday, March 18, 2013 9:32:02 PM UTC-7, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with 2
CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set at 5
minutes while the reindex is going, the index is 5 shards and 1 replica. I
haven't changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO from
98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Beside tackling I/O with ES, you should also use disk monitoring, maybe
there is a faulty disk drive... but, don't ask me how this works in EC2
environment, I assume it's now difference to local disk management.

Jörg

Am 19.03.13 05:45, schrieb Bruno Miranda:

Here are the index
_settings: https://gist.github.com/brupm/d7fe657a9501e617d46c

On Monday, March 18, 2013 9:32:02 PM UTC-7, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each
with 2 CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O <http://cl.ly/image/371H01382h2O>

I have about 20 processes indexing simultaneously. The index is
3.3 million documents and about 8GB on disk. The refresh internal
is set at 5 minutes while the reindex is going, the index is 5
shards and 1 replica. I haven't changed the merge policy.
Cutting the concurrent process to 20 to 10 definitely lowers the
IO from 98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to
finish properly, all records are properly indexed but the 100%
disk IO scares me.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

you could play around with throttling the merge throughput in order to
lower disk utilization, see

In addition, try to check if you can decrease your index size as well
(maybe by deactivating the all field, storing less fields, etc)

If I check your first graph correctly, the writing rate per second is not
that high (but I do not know what is defined fast or slow on AWS either)

--Alex

On Tue, Mar 19, 2013 at 9:22 AM, Jörg Prante joergprante@gmail.com wrote:

Beside tackling I/O with ES, you should also use disk monitoring, maybe
there is a faulty disk drive... but, don't ask me how this works in EC2
environment, I assume it's now difference to local disk management.

Jörg

Am 19.03.13 05:45, schrieb Bruno Miranda:

Here are the index _settings: brupm’s gists · GitHub**
d7fe657a9501e617d46c https://gist.github.com/brupm/d7fe657a9501e617d46c

On Monday, March 18, 2013 9:32:02 PM UTC-7, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each
with 2 CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/**371H01382h2O <http://cl.ly/image/371H01382h2O> <

http://cl.ly/image/**371H01382h2O http://cl.ly/image/371H01382h2O>

I have about 20 processes indexing simultaneously. The index is
3.3 million documents and about 8GB on disk. The refresh internal
is set at 5 minutes while the reindex is going, the index is 5
shards and 1 replica. I haven't changed the merge policy.
Cutting the concurrent process to 20 to 10 definitely lowers the
IO from 98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to
finish properly, all records are properly indexed but the 100%
disk IO scares me.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Mon, 2013-03-18 at 21:32 -0700, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with
2 CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set
at 5 minutes while the reindex is going, the index is 5 shards and 1
replica. I haven't changed the merge policy.

Look at using merge throttling. Merges can be heavy and use a lot of IO.
With throttling, merges will still happen, but won't swamp your system

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

I see you have "SPM" bookmarked in your browser. You should look at graphs
under the "Index Stats" tab -- these:
https://apps.sematext.com/spm-reports/mainPage.do#report_anchor_esRefreshFlushMerge
to see what's going on with ES/Lucene refreshing, flushing, and merging as
you make changes to throttle merges that others have suggested.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, March 19, 2013 12:32:02 AM UTC-4, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with 2
CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set at 5
minutes while the reindex is going, the index is 5 shards and 1 replica. I
haven't changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO from
98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Need to convince the CEO to pay for the plan. I can only see 30 minutes.
(Which should help me diagnose)

Thank you for the reminder.

On Tuesday, March 19, 2013 12:01:14 PM UTC-7, Otis Gospodnetic wrote:

Hi,

I see you have "SPM" bookmarked in your browser. You should look at
graphs under the "Index Stats" tab -- these:
https://apps.sematext.com/spm-reports/mainPage.do#report_anchor_esRefreshFlushMergeto see what's going on with ES/Lucene refreshing, flushing, and merging as
you make changes to throttle merges that others have suggested.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, March 19, 2013 12:32:02 AM UTC-4, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with 2
CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set at 5
minutes while the reindex is going, the index is 5 shards and 1 replica. I
haven't changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO from
98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

which version of ES are you using?

simon

On Tuesday, March 19, 2013 8:03:56 PM UTC+1, Bruno Miranda wrote:

Need to convince the CEO to pay for the plan. I can only see 30 minutes.
(Which should help me diagnose)

Thank you for the reminder.

On Tuesday, March 19, 2013 12:01:14 PM UTC-7, Otis Gospodnetic wrote:

Hi,

I see you have "SPM" bookmarked in your browser. You should look at
graphs under the "Index Stats" tab -- these:
https://apps.sematext.com/spm-reports/mainPage.do#report_anchor_esRefreshFlushMergeto see what's going on with ES/Lucene refreshing, flushing, and merging as
you make changes to throttle merges that others have suggested.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, March 19, 2013 12:32:02 AM UTC-4, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with 2
CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set at 5
minutes while the reindex is going, the index is 5 shards and 1 replica. I
haven't changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO from
98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

0.20.5

On Wednesday, March 20, 2013 1:35:16 PM UTC-7, simonw wrote:

which version of ES are you using?

simon

On Tuesday, March 19, 2013 8:03:56 PM UTC+1, Bruno Miranda wrote:

Need to convince the CEO to pay for the plan. I can only see 30 minutes.
(Which should help me diagnose)

Thank you for the reminder.

On Tuesday, March 19, 2013 12:01:14 PM UTC-7, Otis Gospodnetic wrote:

Hi,

I see you have "SPM" bookmarked in your browser. You should look at
graphs under the "Index Stats" tab -- these:
https://apps.sematext.com/spm-reports/mainPage.do#report_anchor_esRefreshFlushMergeto see what's going on with ES/Lucene refreshing, flushing, and merging as
you make changes to throttle merges that others have suggested.

Otis

ELASTICSEARCH Performance Monitoring -
Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, March 19, 2013 12:32:02 AM UTC-4, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with
2 CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set at 5
minutes while the reindex is going, the index is 5 shards and 1 replica. I
haven't changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO
from 98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I seem to have lowered the IO by limiting the max_bytes_per_sec to 10mb, I
guess there was no limit before by default.

On Wednesday, March 20, 2013 2:43:33 PM UTC-7, Bruno Miranda wrote:

0.20.5

On Wednesday, March 20, 2013 1:35:16 PM UTC-7, simonw wrote:

which version of ES are you using?

simon

On Tuesday, March 19, 2013 8:03:56 PM UTC+1, Bruno Miranda wrote:

Need to convince the CEO to pay for the plan. I can only see 30 minutes.
(Which should help me diagnose)

Thank you for the reminder.

On Tuesday, March 19, 2013 12:01:14 PM UTC-7, Otis Gospodnetic wrote:

Hi,

I see you have "SPM" bookmarked in your browser. You should look at
graphs under the "Index Stats" tab -- these:
https://apps.sematext.com/spm-reports/mainPage.do#report_anchor_esRefreshFlushMergeto see what's going on with ES/Lucene refreshing, flushing, and merging as
you make changes to throttle merges that others have suggested.

Otis

ELASTICSEARCH Performance Monitoring -
Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, March 19, 2013 12:32:02 AM UTC-4, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with
2 CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set at 5
minutes while the reindex is going, the index is 5 shards and 1 replica. I
haven't changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO
from 98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.