Bulk indexing - optimal refresh_interval


(shikhar) #1

The 1.3.0 release notes state:

I'd love to get an explanation on why 30s is better than -1, which is the
setting we are using right now when reindexing.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOaGphdmA%3DbQqV-0ic8HXxLM3ZmBzbW9YFDtZ_zWG8BHA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

I'd say because if you are inserting a lot of data, you will have a massive
hit at the end when you need to index, as opposed to smaller ones along the
way.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 29 July 2014 16:20, shikhar shikhar@schmizz.net wrote:

The 1.3.0 release notes state:

I'd love to get an explanation on why 30s is better than -1, which is the
setting we are using right now when reindexing.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOaGphdmA%3DbQqV-0ic8HXxLM3ZmBzbW9YFDtZ_zWG8BHA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOaGphdmA%3DbQqV-0ic8HXxLM3ZmBzbW9YFDtZ_zWG8BHA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Y6qPWwG5rtqh7V2WaqTTqcWJoRP6%2Bct5CYhGR2xUXPgQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #3

There is no more a massive hit when opening an index for read once than at
every 30 seconds.

The only explanation I can think of is that users perform searches while
indexing and somehow want up-to-date results while they search along.

This is not the case when I do bulk indexing, search is disabled
completely. So I still recommend disabling refresh_interval in the case
when there is no search activity while bulk indexing. And before search is
opened again, the index is flushed, optimized, and extended to replica
levels as well.

Jörg

On Tue, Jul 29, 2014 at 8:23 AM, Mark Walkom markw@campaignmonitor.com
wrote:

I'd say because if you are inserting a lot of data, you will have a
massive hit at the end when you need to index, as opposed to smaller ones
along the way.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 29 July 2014 16:20, shikhar shikhar@schmizz.net wrote:

The 1.3.0 release notes state:

I'd love to get an explanation on why 30s is better than -1, which is
the setting we are using right now when reindexing.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOaGphdmA%3DbQqV-0ic8HXxLM3ZmBzbW9YFDtZ_zWG8BHA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOaGphdmA%3DbQqV-0ic8HXxLM3ZmBzbW9YFDtZ_zWG8BHA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y6qPWwG5rtqh7V2WaqTTqcWJoRP6%2Bct5CYhGR2xUXPgQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y6qPWwG5rtqh7V2WaqTTqcWJoRP6%2Bct5CYhGR2xUXPgQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFEKO7pkn97TY9CbNn3JNb%2B18qVyzaFiXw_rDU8fDg-YQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Michael McCandless) #4

Disabling refresh (-1) is a good choice if you are fully maximizing your
cluster's CPU/IO resources (using enough bulk client threads or async
requests). In that case it should give faster indexing throughput than 30s
refresh.

But if you are not saturating the cluster's resources, then a refresh
interval of 30s may in fact get you faster indexing throughput because
refreshes are done with a background thread in ES, so you effectively get
one more thread working for you than if you disable refresh which causes
the bulk indexing threads to do the flushing.

Try both and see and then report back!

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jul 29, 2014 at 3:11 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

There is no more a massive hit when opening an index for read once than at
every 30 seconds.

The only explanation I can think of is that users perform searches while
indexing and somehow want up-to-date results while they search along.

This is not the case when I do bulk indexing, search is disabled
completely. So I still recommend disabling refresh_interval in the case
when there is no search activity while bulk indexing. And before search is
opened again, the index is flushed, optimized, and extended to replica
levels as well.

Jörg

On Tue, Jul 29, 2014 at 8:23 AM, Mark Walkom markw@campaignmonitor.com
wrote:

I'd say because if you are inserting a lot of data, you will have a
massive hit at the end when you need to index, as opposed to smaller ones
along the way.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 29 July 2014 16:20, shikhar shikhar@schmizz.net wrote:

The 1.3.0 release notes state:

I'd love to get an explanation on why 30s is better than -1, which is
the setting we are using right now when reindexing.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOaGphdmA%3DbQqV-0ic8HXxLM3ZmBzbW9YFDtZ_zWG8BHA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOaGphdmA%3DbQqV-0ic8HXxLM3ZmBzbW9YFDtZ_zWG8BHA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y6qPWwG5rtqh7V2WaqTTqcWJoRP6%2Bct5CYhGR2xUXPgQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y6qPWwG5rtqh7V2WaqTTqcWJoRP6%2Bct5CYhGR2xUXPgQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFEKO7pkn97TY9CbNn3JNb%2B18qVyzaFiXw_rDU8fDg-YQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFEKO7pkn97TY9CbNn3JNb%2B18qVyzaFiXw_rDU8fDg-YQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRfRmRbSYxixHxZBfQ-VKEOwPACc4xFC1tFmNhr7HaRxsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(shikhar) #5

Thanks for the explanation! I'll switch over for the next time I need to
reindex.

On Tue, Jul 29, 2014 at 6:35 PM, Michael McCandless mike@elasticsearch.com
wrote:

Disabling refresh (-1) is a good choice if you are fully maximizing your
cluster's CPU/IO resources (using enough bulk client threads or async
requests). In that case it should give faster indexing throughput than 30s
refresh.

But if you are not saturating the cluster's resources, then a refresh
interval of 30s may in fact get you faster indexing throughput because
refreshes are done with a background thread in ES, so you effectively get
one more thread working for you than if you disable refresh which causes
the bulk indexing threads to do the flushing.

Try both and see and then report back!

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jul 29, 2014 at 3:11 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

There is no more a massive hit when opening an index for read once than
at every 30 seconds.

The only explanation I can think of is that users perform searches while
indexing and somehow want up-to-date results while they search along.

This is not the case when I do bulk indexing, search is disabled
completely. So I still recommend disabling refresh_interval in the case
when there is no search activity while bulk indexing. And before search is
opened again, the index is flushed, optimized, and extended to replica
levels as well.

Jörg

On Tue, Jul 29, 2014 at 8:23 AM, Mark Walkom markw@campaignmonitor.com
wrote:

I'd say because if you are inserting a lot of data, you will have a
massive hit at the end when you need to index, as opposed to smaller ones
along the way.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 29 July 2014 16:20, shikhar shikhar@schmizz.net wrote:

The 1.3.0 release notes state:

I'd love to get an explanation on why 30s is better than -1, which is
the setting we are using right now when reindexing.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOaGphdmA%3DbQqV-0ic8HXxLM3ZmBzbW9YFDtZ_zWG8BHA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOaGphdmA%3DbQqV-0ic8HXxLM3ZmBzbW9YFDtZ_zWG8BHA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y6qPWwG5rtqh7V2WaqTTqcWJoRP6%2Bct5CYhGR2xUXPgQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y6qPWwG5rtqh7V2WaqTTqcWJoRP6%2Bct5CYhGR2xUXPgQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFEKO7pkn97TY9CbNn3JNb%2B18qVyzaFiXw_rDU8fDg-YQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFEKO7pkn97TY9CbNn3JNb%2B18qVyzaFiXw_rDU8fDg-YQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRfRmRbSYxixHxZBfQ-VKEOwPACc4xFC1tFmNhr7HaRxsQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRfRmRbSYxixHxZBfQ-VKEOwPACc4xFC1tFmNhr7HaRxsQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOiKnF7ayAn9CHn%2BVF%2B9VF0KM0Ujnav4PH5nd8B%2BeojFA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6