Bulk indexing and number of shards


(eunever32) #1

Hi,

I'm testing on a single node.

I find I can get better bulk indexing performance when the index has more
shards. Does that make sense ?

My own theory is that when I have multiple bulk clients, then by increasing
shards the server achieves better concurrency (?)

So if I increase the shards to say 30, and get a good indexing run... is it
possible to reduce the number of shards subsequently.. or does it matter if
the number remains at say 30?

Thanks,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6006698-0ced-43f4-959f-52def820013f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #2

Sorry, you can't reduce it. I imagine the performance increase you get is because the merge logic is per shard so it does less when there are more shards for the same data. You can likely get similar numbers if you set the refresh interval to -1 and play with the merge policy before the bulk load. You'd want to reset it afterwords and then run an optimize. This amounts to the same thing as starting with more shards and merging them. Mostly. I think.

Sent from my iPhone

On Apr 12, 2014, at 4:05 PM, eunever32@gmail.com wrote:

Hi,

I'm testing on a single node.

I find I can get better bulk indexing performance when the index has more shards. Does that make sense ?

My own theory is that when I have multiple bulk clients, then by increasing shards the server achieves better concurrency (?)

So if I increase the shards to say 30, and get a good indexing run... is it possible to reduce the number of shards subsequently.. or does it matter if the number remains at say 30?

Thanks,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6006698-0ced-43f4-959f-52def820013f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0CCF99CF-B24F-4C39-9C12-703039EB5BB1%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #3

More thoughts in addition to Nik:

  • the default setting is refresh by every second. Refresh works very fast
    when segments are small. If you have more than one shard and use bulk
    indexing, the segments are small enough for refresh for a longer time. So
    you will observe a faster bulk indexing, but only for the first 20 minutes
    or so (longer runs will also show increasing response times). Disabling
    refresh at bulk indexing time is strongly recommended.

  • the default, ES settings are selected for more than one shard (default is
    5). To utilize the server resources (CPU, RAM) by a single shard, a little
    optimization of thread pools and buffer sizes may be required, especially
    the bulk thread pool and the index buffer size.

Jörg

On Mon, Apr 14, 2014 at 1:30 AM, Nik Everett nik9000@gmail.com wrote:

Sorry, you can't reduce it. I imagine the performance increase you get is
because the merge logic is per shard so it does less when there are more
shards for the same data. You can likely get similar numbers if you set the
refresh interval to -1 and play with the merge policy before the bulk load.
You'd want to reset it afterwords and then run an optimize. This amounts to
the same thing as starting with more shards and merging them. Mostly. I
think.

Sent from my iPhone

On Apr 12, 2014, at 4:05 PM, eunever32@gmail.com wrote:

Hi,

I'm testing on a single node.

I find I can get better bulk indexing performance when the index has more
shards. Does that make sense ?

My own theory is that when I have multiple bulk clients, then by
increasing shards the server achieves better concurrency (?)

So if I increase the shards to say 30, and get a good indexing run... is
it possible to reduce the number of shards subsequently.. or does it matter
if the number remains at say 30?

Thanks,

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d6006698-0ced-43f4-959f-52def820013f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/d6006698-0ced-43f4-959f-52def820013f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0CCF99CF-B24F-4C39-9C12-703039EB5BB1%40gmail.comhttps://groups.google.com/d/msgid/elasticsearch/0CCF99CF-B24F-4C39-9C12-703039EB5BB1%40gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFEsXc5BOuC%2B85TwpSz%3DV_9RACUHmUnA269uW-194uPjA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(eunever32) #4

Thanks guys,
yes I already have refresh interval at -1

What I'm suggesting is that to support multiple client threads say : 50 then it seems that 50 shards is a big help.
ie more shards equals more concurrency.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1b531518-ee7c-424d-a238-eb92c3e59d6d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #5

Although many shards mean higher concurrency, a single shard also has very
high concurrency. ES concurrency is implemented independent of the number
of shards and is very flexible in configuration of thread pools and modules.

The default settings are for a small ES system which can scale to a around
5 nodes, so if you have a powerful machine, you will see that more than one
shard gives better experience simply because the default settings are not
designed for being limited to a single node.

50 shards per node is quite high and needs heavy adjustment at other places
unless you go for a total number of around 50 nodes in the cluster, so I do
not recommend high number of shards by default.

Jörg

On Mon, Apr 14, 2014 at 7:04 PM, eunever32@gmail.com wrote:

Thanks guys,
yes I already have refresh interval at -1

What I'm suggesting is that to support multiple client threads say : 50
then it seems that 50 shards is a big help.
ie more shards equals more concurrency.

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1b531518-ee7c-424d-a238-eb92c3e59d6d%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHoRxLGze22d8PgrjsmWff6FndAqafZx%2BN5J3WSd%3DSYVw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6