Warmer API, high indexing rate, low index refresh interval


(Otis Gospodnetić) #1

Hi,

Shay mentioned the following in another thread:

In 0.20 we will have a warmer API that allows to pre-warm new segments,
meaning that search requests will not "suffer" loading the data.

Will the warmer help people who use low index refresh interval? Or should
those people not use the warmer?

For example, if I'm constantly indexing new docs and I have 1s refresh
interval, if I were to use the new warmer I imagine ES might constantly
be warming up new segments.
I imagine that's not what one would want, right?
ES may even not be able to keep up with the rate of new segment creation if
document input rate is high?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html


(Shay Banon) #2

If you have constant searches on the site that make use of the field data
cache, then warmup makes sense. You want to have the data ready otherwise
actual searches will be blocked loading that data to be available for
search. You can't avoid loading the data... . You can disable the warmup
process using the update settings API if you are doing something like
having a large initial bulk indexing of data. The idea though is that most
warmups will be very quick, because the new segments will be small.

On Tue, May 29, 2012 at 9:25 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi,

Shay mentioned the following in another thread:

In 0.20 we will have a warmer API that allows to pre-warm new segments,
meaning that search requests will not "suffer" loading the data.

Will the warmer help people who use low index refresh interval? Or should
those people not use the warmer?

For example, if I'm constantly indexing new docs and I have 1s refresh
interval, if I were to use the new warmer I imagine ES might constantly
be warming up new segments.
I imagine that's not what one would want, right?
ES may even not be able to keep up with the rate of new segment creation
if document input rate is high?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html


(Otis Gospodnetić) #3

I think I'm missing some vital piece of info here.

It makes sense to me how warming helps when one has an index that is not
being changed very often. This is how Solr cache warming works, too - a
new searcher is opened and it is warmed up by running queries that get
Lucene to load data in FieldCache for faceting and sorting. This works for
Solr because Solr slaves copy index delta from the master every 60 seconds
at the most, typically.

But ElasticSearch is different.

Imagine a system like Twitter, where new documents are constantly being
added and new documents have to be seen in search results as soon as
possible, say within 1 second. How does warming up new segments help here?
The warming up is essentially reading the segment data, I imagine. So
this is the same as what the search request would have to do. The only
advantage I can think of is if:

  1. ES is running on nodes with a good number of CPU cores and one can spare
    1 CPU core to warm up newly created segments non-stop, since they are being
    created non-stop
  2. warming is done in a different thread, not in search threads, so search
    threads don't block

Are 1) & 2) right? Is there a 3)?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

On Tuesday, May 29, 2012 3:30:17 PM UTC-4, kimchy wrote:

If you have constant searches on the site that make use of the field data
cache, then warmup makes sense. You want to have the data ready otherwise
actual searches will be blocked loading that data to be available for
search. You can't avoid loading the data... . You can disable the warmup
process using the update settings API if you are doing something like
having a large initial bulk indexing of data. The idea though is that most
warmups will be very quick, because the new segments will be small.

On Tue, May 29, 2012 at 9:25 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi,

Shay mentioned the following in another thread:

In 0.20 we will have a warmer API that allows to pre-warm new segments,
meaning that search requests will not "suffer" loading the data.

Will the warmer help people who use low index refresh interval? Or
should those people not use the warmer?

For example, if I'm constantly indexing new docs and I have 1s refresh
interval, if I were to use the new warmer I imagine ES might constantly
be warming up new segments.
I imagine that's not what one would want, right?
ES may even not be able to keep up with the rate of new segment creation
if document input rate is high?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html


(Shay Banon) #4

The warmup process will load the field data cache for the new segments to
be searched on a different thread, and not block / cause the search
requests to have to load it. I am not really sure what you don't
understand... :), there is no way around loading the relevant data for new
segments, and its better to do it in the "refresher" thread than the search
thread(s).

On Wed, May 30, 2012 at 7:14 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

I think I'm missing some vital piece of info here.

It makes sense to me how warming helps when one has an index that is not
being changed very often. This is how Solr cache warming works, too - a
new searcher is opened and it is warmed up by running queries that get
Lucene to load data in FieldCache for faceting and sorting. This works for
Solr because Solr slaves copy index delta from the master every 60 seconds
at the most, typically.

But ElasticSearch is different.

Imagine a system like Twitter, where new documents are constantly being
added and new documents have to be seen in search results as soon as
possible, say within 1 second. How does warming up new segments help here?
The warming up is essentially reading the segment data, I imagine. So
this is the same as what the search request would have to do. The only
advantage I can think of is if:

  1. ES is running on nodes with a good number of CPU cores and one can
    spare 1 CPU core to warm up newly created segments non-stop, since they are
    being created non-stop
  2. warming is done in a different thread, not in search threads, so search
    threads don't block

Are 1) & 2) right? Is there a 3)?

Thanks,
Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Tuesday, May 29, 2012 3:30:17 PM UTC-4, kimchy wrote:

If you have constant searches on the site that make use of the field data
cache, then warmup makes sense. You want to have the data ready otherwise
actual searches will be blocked loading that data to be available for
search. You can't avoid loading the data... . You can disable the warmup
process using the update settings API if you are doing something like
having a large initial bulk indexing of data. The idea though is that most
warmups will be very quick, because the new segments will be small.

On Tue, May 29, 2012 at 9:25 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi,

Shay mentioned the following in another thread:

In 0.20 we will have a warmer API that allows to pre-warm new
segments, meaning that search requests will not "suffer" loading the data.

Will the warmer help people who use low index refresh interval? Or
should those people not use the warmer?

For example, if I'm constantly indexing new docs and I have 1s refresh
interval, if I were to use the new warmer I imagine ES might constantly
be warming up new segments.
I imagine that's not what one would want, right?
ES may even not be able to keep up with the rate of new segment creation
if document input rate is high?

Thanks,
Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html


(Otis Gospodnetić) #5

OK, sounds like I did kind of get it after all, thanks Shay.
So it's about having 1 separate warming thread, probably continuously
warming when indexing is continuous, which means one CPU core is going to
be busy just doing that.

But this assumes segments are warmed faster than they are created, right?
If so, what happens when that is not the case?
Do search requests again block?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

On Wednesday, May 30, 2012 4:41:52 AM UTC-4, kimchy wrote:

The warmup process will load the field data cache for the new segments to
be searched on a different thread, and not block / cause the search
requests to have to load it. I am not really sure what you don't
understand... :), there is no way around loading the relevant data for new
segments, and its better to do it in the "refresher" thread than the search
thread(s).

On Wed, May 30, 2012 at 7:14 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

I think I'm missing some vital piece of info here.

It makes sense to me how warming helps when one has an index that is not
being changed very often. This is how Solr cache warming works, too - a
new searcher is opened and it is warmed up by running queries that get
Lucene to load data in FieldCache for faceting and sorting. This works for
Solr because Solr slaves copy index delta from the master every 60 seconds
at the most, typically.

But ElasticSearch is different.

Imagine a system like Twitter, where new documents are constantly being
added and new documents have to be seen in search results as soon as
possible, say within 1 second. How does warming up new segments help here?
The warming up is essentially reading the segment data, I imagine. So
this is the same as what the search request would have to do. The only
advantage I can think of is if:

  1. ES is running on nodes with a good number of CPU cores and one can
    spare 1 CPU core to warm up newly created segments non-stop, since they are
    being created non-stop
  2. warming is done in a different thread, not in search threads, so
    search threads don't block

Are 1) & 2) right? Is there a 3)?

Thanks,
Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Tuesday, May 29, 2012 3:30:17 PM UTC-4, kimchy wrote:

If you have constant searches on the site that make use of the field
data cache, then warmup makes sense. You want to have the data ready
otherwise actual searches will be blocked loading that data to be available
for search. You can't avoid loading the data... . You can disable the
warmup process using the update settings API if you are doing something
like having a large initial bulk indexing of data. The idea though is that
most warmups will be very quick, because the new segments will be small.

On Tue, May 29, 2012 at 9:25 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi,

Shay mentioned the following in another thread:

In 0.20 we will have a warmer API that allows to pre-warm new
segments, meaning that search requests will not "suffer" loading the data.

Will the warmer help people who use low index refresh interval? Or
should those people not use the warmer?

For example, if I'm constantly indexing new docs and I have 1s refresh
interval, if I were to use the new warmer I imagine ES might constantly
be warming up new segments.
I imagine that's not what one would want, right?
ES may even not be able to keep up with the rate of new segment
creation if document input rate is high?

Thanks,
Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html


(Shay Banon) #6

No, search requests are not blocked, because the warmup is done on the same
refresh operation, it means that refreshes will lag to accommodate it.

On Wed, May 30, 2012 at 6:20 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

OK, sounds like I did kind of get it after all, thanks Shay.
So it's about having 1 separate warming thread, probably continuously
warming when indexing is continuous, which means one CPU core is going to
be busy just doing that.

But this assumes segments are warmed faster than they are created, right?
If so, what happens when that is not the case?
Do search requests again block?

Thanks,
Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/**index.htmlhttp://sematext.com/spm/index.html

On Wednesday, May 30, 2012 4:41:52 AM UTC-4, kimchy wrote:

The warmup process will load the field data cache for the new segments to
be searched on a different thread, and not block / cause the search
requests to have to load it. I am not really sure what you don't
understand... :), there is no way around loading the relevant data for new
segments, and its better to do it in the "refresher" thread than the search
thread(s).

On Wed, May 30, 2012 at 7:14 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

I think I'm missing some vital piece of info here.

It makes sense to me how warming helps when one has an index that is not
being changed very often. This is how Solr cache warming works, too - a
new searcher is opened and it is warmed up by running queries that get
Lucene to load data in FieldCache for faceting and sorting. This works for
Solr because Solr slaves copy index delta from the master every 60 seconds
at the most, typically.

But ElasticSearch is different.

Imagine a system like Twitter, where new documents are constantly being
added and new documents have to be seen in search results as soon as
possible, say within 1 second. How does warming up new segments help here?
The warming up is essentially reading the segment data, I imagine. So
this is the same as what the search request would have to do. The only
advantage I can think of is if:

  1. ES is running on nodes with a good number of CPU cores and one can
    spare 1 CPU core to warm up newly created segments non-stop, since they are
    being created non-stop
  2. warming is done in a different thread, not in search threads, so
    search threads don't block

Are 1) & 2) right? Is there a 3)?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/**index.**htmlhttp://sematext.com/spm/index.html

On Tuesday, May 29, 2012 3:30:17 PM UTC-4, kimchy wrote:

If you have constant searches on the site that make use of the field
data cache, then warmup makes sense. You want to have the data ready
otherwise actual searches will be blocked loading that data to be available
for search. You can't avoid loading the data... . You can disable the
warmup process using the update settings API if you are doing something
like having a large initial bulk indexing of data. The idea though is that
most warmups will be very quick, because the new segments will be small.

On Tue, May 29, 2012 at 9:25 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi,

Shay mentioned the following in another thread:

In 0.20 we will have a warmer API that allows to pre-warm new
segments, meaning that search requests will not "suffer" loading the data.

Will the warmer help people who use low index refresh interval? Or
should those people not use the warmer?

For example, if I'm constantly indexing new docs and I have 1s refresh
interval, if I were to use the new warmer I imagine ES might constantly
be warming up new segments.
I imagine that's not what one would want, right?
ES may even not be able to keep up with the rate of new segment
creation if document input rate is high?

Thanks,
Otis

Search Analytics - http://sematext.com/search-analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.****
html http://sematext.com/spm/index.html


(system) #7