Very high disk IO while indexing

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with 2
CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3 million
documents and about 8GB on disk. The refresh internal is set at 5 minutes
while the reindex is going, the index is 5 shards and 1 replica. I haven't
changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO from
98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Here are the index
_settings: https://gist.github.com/brupm/d7fe657a9501e617d46c

On Monday, March 18, 2013 9:32:02 PM UTC-7, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with 2
CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set at 5
minutes while the reindex is going, the index is 5 shards and 1 replica. I
haven't changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO from
98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Beside tackling I/O with ES, you should also use disk monitoring, maybe
there is a faulty disk drive... but, don't ask me how this works in EC2
environment, I assume it's now difference to local disk management.

Jörg

Am 19.03.13 05:45, schrieb Bruno Miranda:

Here are the index
_settings: https://gist.github.com/brupm/d7fe657a9501e617d46c

On Monday, March 18, 2013 9:32:02 PM UTC-7, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each
with 2 CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O <http://cl.ly/image/371H01382h2O>

I have about 20 processes indexing simultaneously. The index is
3.3 million documents and about 8GB on disk. The refresh internal
is set at 5 minutes while the reindex is going, the index is 5
shards and 1 replica. I haven't changed the merge policy.
Cutting the concurrent process to 20 to 10 definitely lowers the
IO from 98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to
finish properly, all records are properly indexed but the 100%
disk IO scares me.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

you could play around with throttling the merge throughput in order to
lower disk utilization, see
http://www.elasticsearch.org/guide/reference/index-modules/store.html
In addition, try to check if you can decrease your index size as well
(maybe by deactivating the all field, storing less fields, etc)

If I check your first graph correctly, the writing rate per second is not
that high (but I do not know what is defined fast or slow on AWS either)

--Alex

On Tue, Mar 19, 2013 at 9:22 AM, Jörg Prante joergprante@gmail.com wrote:

Beside tackling I/O with ES, you should also use disk monitoring, maybe
there is a faulty disk drive... but, don't ask me how this works in EC2
environment, I assume it's now difference to local disk management.

Jörg

Am 19.03.13 05:45, schrieb Bruno Miranda:

Here are the index _settings: https://gist.github.com/brupm/**
d7fe657a9501e617d46c https://gist.github.com/brupm/d7fe657a9501e617d46c

On Monday, March 18, 2013 9:32:02 PM UTC-7, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each
with 2 CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/**371H01382h2O <http://cl.ly/image/371H01382h2O> <

http://cl.ly/image/**371H01382h2O http://cl.ly/image/371H01382h2O>

I have about 20 processes indexing simultaneously. The index is
3.3 million documents and about 8GB on disk. The refresh internal
is set at 5 minutes while the reindex is going, the index is 5
shards and 1 replica. I haven't changed the merge policy.
Cutting the concurrent process to 20 to 10 definitely lowers the
IO from 98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to
finish properly, all records are properly indexed but the 100%
disk IO scares me.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Mon, 2013-03-18 at 21:32 -0700, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with
2 CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set
at 5 minutes while the reindex is going, the index is 5 shards and 1
replica. I haven't changed the merge policy.

Look at using merge throttling. Merges can be heavy and use a lot of IO.
With throttling, merges will still happen, but won't swamp your system

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

I see you have "SPM" bookmarked in your browser. You should look at graphs
under the "Index Stats" tab -- these:
https://apps.sematext.com/spm-reports/mainPage.do#report_anchor_esRefreshFlushMerge
to see what's going on with ES/Lucene refreshing, flushing, and merging as
you make changes to throttle merges that others have suggested.

Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

On Tuesday, March 19, 2013 12:32:02 AM UTC-4, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with 2
CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set at 5
minutes while the reindex is going, the index is 5 shards and 1 replica. I
haven't changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO from
98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Need to convince the CEO to pay for the plan. I can only see 30 minutes.
(Which should help me diagnose)

Thank you for the reminder.

On Tuesday, March 19, 2013 12:01:14 PM UTC-7, Otis Gospodnetic wrote:

Hi,

I see you have "SPM" bookmarked in your browser. You should look at
graphs under the "Index Stats" tab -- these:
https://apps.sematext.com/spm-reports/mainPage.do#report_anchor_esRefreshFlushMergeto see what's going on with ES/Lucene refreshing, flushing, and merging as
you make changes to throttle merges that others have suggested.

Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

On Tuesday, March 19, 2013 12:32:02 AM UTC-4, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with 2
CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set at 5
minutes while the reindex is going, the index is 5 shards and 1 replica. I
haven't changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO from
98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

which version of ES are you using?

simon

On Tuesday, March 19, 2013 8:03:56 PM UTC+1, Bruno Miranda wrote:

Need to convince the CEO to pay for the plan. I can only see 30 minutes.
(Which should help me diagnose)

Thank you for the reminder.

On Tuesday, March 19, 2013 12:01:14 PM UTC-7, Otis Gospodnetic wrote:

Hi,

I see you have "SPM" bookmarked in your browser. You should look at
graphs under the "Index Stats" tab -- these:
https://apps.sematext.com/spm-reports/mainPage.do#report_anchor_esRefreshFlushMergeto see what's going on with ES/Lucene refreshing, flushing, and merging as
you make changes to throttle merges that others have suggested.

Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html

On Tuesday, March 19, 2013 12:32:02 AM UTC-4, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with 2
CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set at 5
minutes while the reindex is going, the index is 5 shards and 1 replica. I
haven't changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO from
98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

0.20.5

On Wednesday, March 20, 2013 1:35:16 PM UTC-7, simonw wrote:

which version of ES are you using?

simon

On Tuesday, March 19, 2013 8:03:56 PM UTC+1, Bruno Miranda wrote:

Need to convince the CEO to pay for the plan. I can only see 30 minutes.
(Which should help me diagnose)

Thank you for the reminder.

On Tuesday, March 19, 2013 12:01:14 PM UTC-7, Otis Gospodnetic wrote:

Hi,

I see you have "SPM" bookmarked in your browser. You should look at
graphs under the "Index Stats" tab -- these:
https://apps.sematext.com/spm-reports/mainPage.do#report_anchor_esRefreshFlushMergeto see what's going on with ES/Lucene refreshing, flushing, and merging as
you make changes to throttle merges that others have suggested.

Otis

ELASTICSEARCH Performance Monitoring -
http://sematext.com/spm/index.html

On Tuesday, March 19, 2013 12:32:02 AM UTC-4, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with
2 CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set at 5
minutes while the reindex is going, the index is 5 shards and 1 replica. I
haven't changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO
from 98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I seem to have lowered the IO by limiting the max_bytes_per_sec to 10mb, I
guess there was no limit before by default.

On Wednesday, March 20, 2013 2:43:33 PM UTC-7, Bruno Miranda wrote:

0.20.5

On Wednesday, March 20, 2013 1:35:16 PM UTC-7, simonw wrote:

which version of ES are you using?

simon

On Tuesday, March 19, 2013 8:03:56 PM UTC+1, Bruno Miranda wrote:

Need to convince the CEO to pay for the plan. I can only see 30 minutes.
(Which should help me diagnose)

Thank you for the reminder.

On Tuesday, March 19, 2013 12:01:14 PM UTC-7, Otis Gospodnetic wrote:

Hi,

I see you have "SPM" bookmarked in your browser. You should look at
graphs under the "Index Stats" tab -- these:
https://apps.sematext.com/spm-reports/mainPage.do#report_anchor_esRefreshFlushMergeto see what's going on with ES/Lucene refreshing, flushing, and merging as
you make changes to throttle merges that others have suggested.

Otis

ELASTICSEARCH Performance Monitoring -
http://sematext.com/spm/index.html

On Tuesday, March 19, 2013 12:32:02 AM UTC-4, Bruno Miranda wrote:

While indexing to our QA environment, 2 nodes (EC2 m1.large) each with
2 CPUs 7.3 GBs RAM I am seeing exceptionally high Disk IO.
http://cl.ly/image/371H01382h2O

I have about 20 processes indexing simultaneously. The index is 3.3
million documents and about 8GB on disk. The refresh internal is set at 5
minutes while the reindex is going, the index is 5 shards and 1 replica. I
haven't changed the merge policy.

Cutting the concurrent process to 20 to 10 definitely lowers the IO
from 98% to around 50%.

Any tips on lowering disk IO? Is this normal? Reindex seems to finish
properly, all records are properly indexed but the 100% disk IO scares me.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.