Why disk read is high than disk write?

hi! my first question!

if replica 0 bulk index, then disk read, write ratio same.
but batch finish after... disk read is high than write.
disk read = 10 X disk write...

so cpu load is high, then batch index very slow.. T.T

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8b0c64a1-6aec-4e3c-a13a-723087919c77%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I think there are two potential causes:

  • refreshes
  • id lookups

Refreshes run periodically in order to make data fast to search,

gives recommandations to improve indexing speed by increasing the refresh
interval.

Id lookups are required in order to check if the document that you are
indexing is replacing another document. Note however that since
Elasticsearch 1.2, Elasticsearch can skip this step if you use
auto-generated ids: https://github.com/elasticsearch/elasticsearch/pull/5917

On Wed, Nov 5, 2014 at 9:14 AM, 이윤동 leeyd81@gmail.com wrote:

hi! my first question!

if replica 0 bulk index, then disk read, write ratio same.
but batch finish after... disk read is high than write.
disk read = 10 X disk write...

so cpu load is high, then batch index very slow.. T.T

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8b0c64a1-6aec-4e3c-a13a-723087919c77%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8b0c64a1-6aec-4e3c-a13a-723087919c77%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6GROAzcX0BChLjfcgKP7TaKt-sBoTukrpc2a4W2U0%2BuA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Merges probably also play a part here.

On 5 November 2014 19:24, Adrien Grand adrien.grand@elasticsearch.com
wrote:

I think there are two potential causes:

  • refreshes
  • id lookups

Refreshes run periodically in order to make data fast to search,
Elasticsearch Platform — Find real-time answers at scale | Elastic
gives recommandations to improve indexing speed by increasing the refresh
interval.

Id lookups are required in order to check if the document that you are
indexing is replacing another document. Note however that since
Elasticsearch 1.2, Elasticsearch can skip this step if you use
auto-generated ids:
https://github.com/elasticsearch/elasticsearch/pull/5917

On Wed, Nov 5, 2014 at 9:14 AM, 이윤동 leeyd81@gmail.com wrote:

hi! my first question!

if replica 0 bulk index, then disk read, write ratio same.
but batch finish after... disk read is high than write.
disk read = 10 X disk write...

so cpu load is high, then batch index very slow.. T.T

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8b0c64a1-6aec-4e3c-a13a-723087919c77%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8b0c64a1-6aec-4e3c-a13a-723087919c77%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6GROAzcX0BChLjfcgKP7TaKt-sBoTukrpc2a4W2U0%2BuA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6GROAzcX0BChLjfcgKP7TaKt-sBoTukrpc2a4W2U0%2BuA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZ%3DV0Z79i_CWyZBV-9UdfF%2BX-6gatQJ0wzLnwHTVg_RbWw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

thanks answer!

when replica 0, index speed fast,
when replica 1, index speed very slow...

refresh interval same.120s
id lookup is good point.
but we need out id, can't use auto-generated id.. T.T

and when replica 0 and 1, always id lookup.

2014년 11월 5일 수요일 오후 5시 14분 54초 UTC+9, 이윤동 님의 말:

hi! my first question!

if replica 0 bulk index, then disk read, write ratio same.
but batch finish after... disk read is high than write.
disk read = 10 X disk write...

so cpu load is high, then batch index very slow.. T.T

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/99a66b9e-2373-49d1-be46-c47c2c76987a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Something that could happen is that with 0 replicas all the data fit into
your filesystem cache (so everything is done in memory) while with 1
replica, some filesystem operations are translated to actual disk seeks.

Another different between 0 and 1 replicas is that in the latter case,
elasticsearch will wait for the data to be written on 2 shards before
returning. When indexing is slow, are you maxing out the CPU and I/O of
your machine? If not then maybe you just need to increase the concurrency
of indexing requests on client side?

On Wed, Nov 5, 2014 at 9:51 AM, 이윤동 leeyd81@gmail.com wrote:

thanks answer!

when replica 0, index speed fast,
when replica 1, index speed very slow...

refresh interval same.120s
id lookup is good point.
but we need out id, can't use auto-generated id.. T.T

and when replica 0 and 1, always id lookup.

2014년 11월 5일 수요일 오후 5시 14분 54초 UTC+9, 이윤동 님의 말:

hi! my first question!

if replica 0 bulk index, then disk read, write ratio same.
but batch finish after... disk read is high than write.
disk read = 10 X disk write...

so cpu load is high, then batch index very slow.. T.T

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/99a66b9e-2373-49d1-be46-c47c2c76987a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/99a66b9e-2373-49d1-be46-c47c2c76987a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7Ki0k3BNtbLo%3DaRToOJMCtyuBGO3u_%3D6UymNcejjkPwg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

our index data is over 10T, so not enough in memory. ( 10 machine, memory
max 24g )

cpu is now 20 ~ 30%, wait cpu 20 ~ 25%.
disk read 60m, write 6m
cpu load 20

the problem...

  • disk read very high( no search ) -> cpu load high -> index slow...
    out goal disk read decrease.

add question!
our cluster 10 machine. but 1 ~ 2 machie converge bulk request.
we use java TransportClinet

2014년 11월 5일 수요일 오후 5시 14분 54초 UTC+9, 이윤동 님의 말:

hi! my first question!

if replica 0 bulk index, then disk read, write ratio same.
but batch finish after... disk read is high than write.
disk read = 10 X disk write...

so cpu load is high, then batch index very slow.. T.T

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91324af8-dc4a-449f-939a-1594c4a8e0fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.