How to control on ES nodes while doing bulk indexing through river

Hi

 In our prod ES cluster, we have 5 nodes and I have to re-index all the 

documents. I am doing bulk indexing through river, and while doing this
load on ES nodes increase to 10 - 15+, and obviously this slows down the
search. Can anyone suggest what should I do in order to bulk index through
river and at the same time keep the load on nodes under control. I have to
re-index 50 million documents.

Regards
MJR

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

you can decrease the bulk thread pool size, you can change or disable the
refresh interval, you can enable merge throttling (which enabled by default
on newer versions). As you havent provided any information about the
elasticsearch version you are using or the configuration you have applied,
it is hard to go into any detail, but the above configuration options might
be a start.

--Alex

On Wed, Jul 10, 2013 at 12:45 AM, MJR M emjayaarr@gmail.com wrote:

Hi

 In our prod ES cluster, we have 5 nodes and I have to re-index all

the documents. I am doing bulk indexing through river, and while doing this
load on ES nodes increase to 10 - 15+, and obviously this slows down the
search. Can anyone suggest what should I do in order to bulk index through
river and at the same time keep the load on nodes under control. I have to
re-index 50 million documents.

Regards
MJR

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

if you want both - max perf on load and max perf on search - you should
use two indexes, one for the old generation and one for new generation,
und connect them with an index alias.
Distribute the indexes over the nodes so they form two separated groups,
that is, so they use different machines (for example, by shard moving,
shard allocation). Set replica level to 0 (no replicas) for the new gen
index. Forward search only to those nodes with the old gen. After bulk
is complete, add replica level to new gen, and switch from old to new
with the help of index alias (or by just dropping the old gen). You may
see a perf hit when replicas are building up but this is not much
compared to bulk load.

Jörg

Am 10.07.13 00:45, schrieb MJR M:

Hi

 In our prod ES cluster, we have 5 nodes and I have to re-index 

all the documents. I am doing bulk indexing through river, and while
doing this load on ES nodes increase to 10 - 15+, and obviously this
slows down the search. Can anyone suggest what should I do in order to
bulk index through river and at the same time keep the load on nodes
under control. I have to re-index 50 million documents.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Alex, will try your suggestions. We are using ES 0.20.4, not
upgraded to 0.90.2 yet.

On Wednesday, July 10, 2013 1:50:30 AM UTC-7, Alexander Reelsen wrote:

Hey,

you can decrease the bulk thread pool size, you can change or disable the
refresh interval, you can enable merge throttling (which enabled by default
on newer versions). As you havent provided any information about the
elasticsearch version you are using or the configuration you have applied,
it is hard to go into any detail, but the above configuration options might
be a start.

--Alex

On Wed, Jul 10, 2013 at 12:45 AM, MJR M <emja...@gmail.com <javascript:>>wrote:

Hi

 In our prod ES cluster, we have 5 nodes and I have to re-index all 

the documents. I am doing bulk indexing through river, and while doing this
load on ES nodes increase to 10 - 15+, and obviously this slows down the
search. Can anyone suggest what should I do in order to bulk index through
river and at the same time keep the load on nodes under control. I have to
re-index 50 million documents.

Regards
MJR

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Jorg for your suggestions.

On Wednesday, July 10, 2013 5:42:32 AM UTC-7, Jörg Prante wrote:

if you want both - max perf on load and max perf on search - you should
use two indexes, one for the old generation and one for new generation,
und connect them with an index alias.
Distribute the indexes over the nodes so they form two separated groups,
that is, so they use different machines (for example, by shard moving,
shard allocation). Set replica level to 0 (no replicas) for the new gen
index. Forward search only to those nodes with the old gen. After bulk
is complete, add replica level to new gen, and switch from old to new
with the help of index alias (or by just dropping the old gen). You may
see a perf hit when replicas are building up but this is not much
compared to bulk load.

Jörg

Am 10.07.13 00:45, schrieb MJR M:

Hi

 In our prod ES cluster, we have 5 nodes and I have to re-index 

all the documents. I am doing bulk indexing through river, and while
doing this load on ES nodes increase to 10 - 15+, and obviously this
slows down the search. Can anyone suggest what should I do in order to
bulk index through river and at the same time keep the load on nodes
under control. I have to re-index 50 million documents.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.