Fastes way to import 100m rows


(Andreas Hembach) #1

Hi all,

i need to import 100m rows. At the moment i use the bulk api with 100
entries at once (Is that a good or a bad value?).
But i only get ~500 rows/second imported (Is that a lot or a little more?). Is
there a way to import the data faster?

I set number_of_replicas = 0 and refresh_interval = -1 without a big
difference to the default values.

Thank you for your help,
Andreas

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #2

That's a very low rate. Are you importing locally or via remote connection?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Mon, May 19, 2014 at 7:16 PM, Andreas Hembach hembach3@gmail.com wrote:

Hi all,

i need to import 100m rows. At the moment i use the bulk api with 100
entries at once (Is that a good or a bad value?).
But i only get ~500 rows/second imported (Is that a lot or a little more?).
Is there a way to import the data faster?

I set number_of_replicas = 0 and refresh_interval = -1 without a big
difference to the default values.

Thank you for your help,
Andreas

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt5k26KK-NTk-o-BhBOSxrT6hmr1k9QT6o6Ten5A_AnUA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Andreas Hembach) #3

Hi,

i am importing locally. Ok its a Testserver with only 2 CPU's 2.40GHz and 4
GB Ram but for testing i only import 100k rows.

Greetings,
Andreas

Am Montag, 19. Mai 2014 18:17:52 UTC+2 schrieb Itamar Syn-Hershko:

That's a very low rate. Are you importing locally or via remote connection?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Mon, May 19, 2014 at 7:16 PM, Andreas Hembach <hemb...@gmail.com<javascript:>

wrote:

Hi all,

i need to import 100m rows. At the moment i use the bulk api with 100
entries at once (Is that a good or a bad value?).
But i only get ~500 rows/second imported (Is that a lot or a little more?).
Is there a way to import the data faster?

I set number_of_replicas = 0 and refresh_interval = -1 without a big
difference to the default values.

Thank you for your help,
Andreas

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c931035b-1925-4345-8029-96b260edca11%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #4

That doesn't seem right, try making larger bulk sizes. Also, what size is
your docs?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Mon, May 19, 2014 at 7:35 PM, Andreas Hembach hembach3@gmail.com wrote:

Hi,

i am importing locally. Ok its a Testserver with only 2 CPU's 2.40GHz and
4 GB Ram but for testing i only import 100k rows.

Greetings,
Andreas

Am Montag, 19. Mai 2014 18:17:52 UTC+2 schrieb Itamar Syn-Hershko:

That's a very low rate. Are you importing locally or via remote
connection?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Mon, May 19, 2014 at 7:16 PM, Andreas Hembach hemb...@gmail.comwrote:

Hi all,

i need to import 100m rows. At the moment i use the bulk api with 100
entries at once (Is that a good or a bad value?).
But i only get ~500 rows/second imported (Is that a lot or a little more
?). Is there a way to import the data faster?

I set number_of_replicas = 0 and refresh_interval = -1 without a big
difference to the default values.

Thank you for your help,
Andreas

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c931035b-1925-4345-8029-96b260edca11%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/c931035b-1925-4345-8029-96b260edca11%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuaF53KCw9M9M58h1pMXuK8gFvXekGp1C_3LdTJ8i-Orw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Andreas Hembach) #5

Hi,

i have tried some bulk sizes.

BulkSize => Rows per Second
5000 => 500
4000 => 625
2000 => 625
1000 => 550
500 => 500

So i think 4000 is the best value. My docs have about 270 columns. Is it to
much? It is a denormalized view.

While the imports, the CPU load is around 90%. This could be the bottleneck?

Thanks and regards,
Andreas
Am Montag, 19. Mai 2014 18:51:54 UTC+2 schrieb Itamar Syn-Hershko:

That doesn't seem right, try making larger bulk sizes. Also, what size is
your docs?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Mon, May 19, 2014 at 7:35 PM, Andreas Hembach <hemb...@gmail.com<javascript:>

wrote:

Hi,

i am importing locally. Ok its a Testserver with only 2 CPU's 2.40GHz and
4 GB Ram but for testing i only import 100k rows.

Greetings,
Andreas

Am Montag, 19. Mai 2014 18:17:52 UTC+2 schrieb Itamar Syn-Hershko:

That's a very low rate. Are you importing locally or via remote
connection?

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Mon, May 19, 2014 at 7:16 PM, Andreas Hembach hemb...@gmail.comwrote:

Hi all,

i need to import 100m rows. At the moment i use the bulk api with 100
entries at once (Is that a good or a bad value?).
But i only get ~500 rows/second imported (Is that a lot or a little
more?). Is there a way to import the data faster?

I set number_of_replicas = 0 and refresh_interval = -1 without a big
difference to the default values.

Thank you for your help,
Andreas

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7b1fdb0d-067f-490b-879f-805593a0d4cc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c931035b-1925-4345-8029-96b260edca11%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/c931035b-1925-4345-8029-96b260edca11%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/17585b3b-fa20-49b6-8579-699d4097f28a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6