Why disk read is high than disk write?

11172 · November 5, 2014, 8:14am

hi! my first question!

if replica 0 bulk index, then disk read, write ratio same.
but batch finish after... disk read is high than write.
disk read = 10 X disk write...

so cpu load is high, then batch index very slow.. T.T

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8b0c64a1-6aec-4e3c-a13a-723087919c77%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · November 5, 2014, 8:24am

I think there are two potential causes:

refreshes
id lookups

Refreshes run periodically in order to make data fast to search,

gives recommandations to improve indexing speed by increasing the refresh
interval.

Id lookups are required in order to check if the document that you are
indexing is replacing another document. Note however that since
Elasticsearch 1.2, Elasticsearch can skip this step if you use
auto-generated ids: https://github.com/elasticsearch/elasticsearch/pull/5917

On Wed, Nov 5, 2014 at 9:14 AM, 이윤동 leeyd81@gmail.com wrote:

hi! my first question!

if replica 0 bulk index, then disk read, write ratio same.
but batch finish after... disk read is high than write.
disk read = 10 X disk write...

so cpu load is high, then batch index very slow.. T.T

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8b0c64a1-6aec-4e3c-a13a-723087919c77%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8b0c64a1-6aec-4e3c-a13a-723087919c77%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6GROAzcX0BChLjfcgKP7TaKt-sBoTukrpc2a4W2U0%2BuA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · November 5, 2014, 8:32am

Merges probably also play a part here.

On 5 November 2014 19:24, Adrien Grand adrien.grand@elasticsearch.com
wrote:

I think there are two potential causes:

refreshes

id lookups

Refreshes run periodically in order to make data fast to search,
Elasticsearch Platform — Find real-time answers at scale | Elastic
gives recommandations to improve indexing speed by increasing the refresh
interval.

Id lookups are required in order to check if the document that you are
indexing is replacing another document. Note however that since
Elasticsearch 1.2, Elasticsearch can skip this step if you use
auto-generated ids:
https://github.com/elasticsearch/elasticsearch/pull/5917

On Wed, Nov 5, 2014 at 9:14 AM, 이윤동 leeyd81@gmail.com wrote:

hi! my first question!

if replica 0 bulk index, then disk read, write ratio same.
but batch finish after... disk read is high than write.
disk read = 10 X disk write...

so cpu load is high, then batch index very slow.. T.T

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8b0c64a1-6aec-4e3c-a13a-723087919c77%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8b0c64a1-6aec-4e3c-a13a-723087919c77%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6GROAzcX0BChLjfcgKP7TaKt-sBoTukrpc2a4W2U0%2BuA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6GROAzcX0BChLjfcgKP7TaKt-sBoTukrpc2a4W2U0%2BuA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZ%3DV0Z79i_CWyZBV-9UdfF%2BX-6gatQJ0wzLnwHTVg_RbWw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

11172 · November 5, 2014, 8:51am

thanks answer!

when replica 0, index speed fast,
when replica 1, index speed very slow...

refresh interval same.120s
id lookup is good point.
but we need out id, can't use auto-generated id.. T.T

and when replica 0 and 1, always id lookup.

2014년 11월 5일 수요일 오후 5시 14분 54초 UTC+9, 이윤동 님의 말:

hi! my first question!

if replica 0 bulk index, then disk read, write ratio same.
but batch finish after... disk read is high than write.
disk read = 10 X disk write...

so cpu load is high, then batch index very slow.. T.T

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/99a66b9e-2373-49d1-be46-c47c2c76987a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · November 5, 2014, 9:00am

Something that could happen is that with 0 replicas all the data fit into
your filesystem cache (so everything is done in memory) while with 1
replica, some filesystem operations are translated to actual disk seeks.

Another different between 0 and 1 replicas is that in the latter case,
elasticsearch will wait for the data to be written on 2 shards before
returning. When indexing is slow, are you maxing out the CPU and I/O of
your machine? If not then maybe you just need to increase the concurrency
of indexing requests on client side?

On Wed, Nov 5, 2014 at 9:51 AM, 이윤동 leeyd81@gmail.com wrote:

thanks answer!

when replica 0, index speed fast,
when replica 1, index speed very slow...

refresh interval same.120s
id lookup is good point.
but we need out id, can't use auto-generated id.. T.T

and when replica 0 and 1, always id lookup.

2014년 11월 5일 수요일 오후 5시 14분 54초 UTC+9, 이윤동 님의 말:

hi! my first question!

if replica 0 bulk index, then disk read, write ratio same.
but batch finish after... disk read is high than write.
disk read = 10 X disk write...

so cpu load is high, then batch index very slow.. T.T

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/99a66b9e-2373-49d1-be46-c47c2c76987a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/99a66b9e-2373-49d1-be46-c47c2c76987a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7Ki0k3BNtbLo%3DaRToOJMCtyuBGO3u_%3D6UymNcejjkPwg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

11172 · November 5, 2014, 9:36am

our index data is over 10T, so not enough in memory. ( 10 machine, memory
max 24g )

cpu is now 20 ~ 30%, wait cpu 20 ~ 25%.
disk read 60m, write 6m
cpu load 20

the problem...

disk read very high( no search ) -> cpu load high -> index slow...
out goal disk read decrease.

add question!
our cluster 10 machine. but 1 ~ 2 machie converge bulk request.
we use java TransportClinet

2014년 11월 5일 수요일 오후 5시 14분 54초 UTC+9, 이윤동 님의 말:

hi! my first question!

if replica 0 bulk index, then disk read, write ratio same.
but batch finish after... disk read is high than write.
disk read = 10 X disk write...

so cpu load is high, then batch index very slow.. T.T

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91324af8-dc4a-449f-939a-1594c4a8e0fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Bulk indexing creates a lot of disk read OPS Elasticsearch	12	2580	July 6, 2017
Very high disk IO while indexing Elasticsearch	10	5767	July 6, 2017
Elasticsearch : Hight disk read + slow indexing Elasticsearch	1	404	April 8, 2020
Bulk load has spiky behavior Elasticsearch	11	700	July 6, 2017
Elasticsearch version 5.4.3 .bulk insert with so many disk reads,but there is no merge operation the same time Elasticsearch	21	803	May 27, 2020

Why disk read is high than disk write?

Related topics