First search for a while always slow

PabloEskimo · April 16, 2012, 9:59am

Hi,

I have a strange problem with ElasticSearch.
ElasticSearch is setup with number of indexes rotated on a daily basis (eg:
20120416-logs) for collecting logs from a large number of servers (~2GB/day
currently).
On top of that I've setup a 'logs' alias which allows me to query all of
the indexes.

It seems that if I don't make a query on ElasticSearch for a few days, when
I do the first one will always take a very long time / timeout.
Subsequent queries run fine.

The example below shows me searching for a random string that I know won't
appear in any of the indexes.
It doesn't seem to make any difference whether the string exists in the
indexes or not - or if it changes between searches.
It just seems that the first query is always slow.

Is there some batch process I can run on a regular basis to stop this
happening? Am I doing something wrong?

[paul@macbook ~]$ curl -XGET
'http://logs1:9200/logs/message/_search?q=apegjaegpojaeg'
{"took":7387,"timed_out":false,"_shards":{"total":13,"successful":13,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

[paul@macbook ~]$ curl
-XGET 'http://logs1:9200/logs/message/_search?q=apegjaegpojaeg'
{"took":16,"timed_out":false,"_shards":{"total":13,"successful":13,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

[paul@macbook ~]$ curl -XGET 'http://logs1:9200/logs/message/_search?q=open'
{"took":22,"timed_out":false,"_shards":{"total":13,"successful":13,"failed":0},"hits":{"total":94769,"max_score":1.1631737,"hits":[
<< HITS REMOVED FOR PRIVACY >>}}]}

Lukas_Vlcek1 · April 16, 2012, 1:02pm

Hi,

If I read it correctly, you are creating a new indices every day (and
probably archiving/deleting old indices as well). This is probably result
of Field and Filter caches invalidation which is happening due to Lucene
segment merges. These filters are populated with the first query. If you
need to have your caches warmed upfront you can setup some background job
to execute some queries (typically, you can take some long running queries
from your user logs).

Regards,
Lukas

On Mon, Apr 16, 2012 at 11:59 AM, PabloEskimo paul.maddox@gmail.com wrote:

Hi,

I have a strange problem with Elasticsearch.
Elasticsearch is setup with number of indexes rotated on a daily basis
(eg: 20120416-logs) for collecting logs from a large number of servers
(~2GB/day currently).
On top of that I've setup a 'logs' alias which allows me to query all of
the indexes.

It seems that if I don't make a query on Elasticsearch for a few days,
when I do the first one will always take a very long time / timeout.
Subsequent queries run fine.

The example below shows me searching for a random string that I know won't
appear in any of the indexes.
It doesn't seem to make any difference whether the string exists in the
indexes or not - or if it changes between searches.
It just seems that the first query is always slow.

Is there some batch process I can run on a regular basis to stop this
happening? Am I doing something wrong?

[paul@macbook ~]$ curl -XGET '
http://logs1:9200/logs/message/_search?q=apegjaegpojaeg'

{"took":7387,"timed_out":false,"_shards":{"total":13,"successful":13,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}

[paul@macbook ~]$ curl -XGET '
http://logs1:9200/logs/message/_search?q=apegjaegpojaeg'

{"took":16,"timed_out":false,"_shards":{"total":13,"successful":13,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}

[paul@macbook ~]$ curl -XGET '
http://logs1:9200/logs/message/_search?q=open'
{"took":22,"timed_out":false,"_shards":{"total":13,"successful":13,"failed":0},"hits":{"total":94769,"max_score":1.1631737,"hits":[
<< HITS REMOVED FOR PRIVACY >>}}]}

PabloEskimo · April 16, 2012, 1:28pm

You read it exactly right.
I had been toying with the idea of triggering a scheduled query every hour
or so.

If I'm rotating the indices at 00:00 then am I correct in thinking I only
need to warm those caches once per day after the rotation is complete?

Thanks

On Monday, 16 April 2012 14:02:10 UTC+1, Lukáš Vlček wrote:

Hi,

If I read it correctly, you are creating a new indices every day (and
probably archiving/deleting old indices as well). This is probably result
of Field and Filter caches invalidation which is happening due to Lucene
segment merges. These filters are populated with the first query. If you
need to have your caches warmed upfront you can setup some background job
to execute some queries (typically, you can take some long running queries
from your user logs).

Regards,
Lukas

On Mon, Apr 16, 2012 at 11:59 AM, PabloEskimo paul.maddox@gmail.comwrote:

Hi,

I have a strange problem with Elasticsearch.
Elasticsearch is setup with number of indexes rotated on a daily basis
(eg: 20120416-logs) for collecting logs from a large number of servers
(~2GB/day currently).
On top of that I've setup a 'logs' alias which allows me to query all of
the indexes.

It seems that if I don't make a query on Elasticsearch for a few days,
when I do the first one will always take a very long time / timeout.
Subsequent queries run fine.

The example below shows me searching for a random string that I know
won't appear in any of the indexes.
It doesn't seem to make any difference whether the string exists in the
indexes or not - or if it changes between searches.
It just seems that the first query is always slow.

Is there some batch process I can run on a regular basis to stop this
happening? Am I doing something wrong?

[paul@macbook ~]$ curl -XGET '
http://logs1:9200/logs/message/_search?q=apegjaegpojaeg'

{"took":7387,"timed_out":false,"_shards":{"total":13,"successful":13,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}

[paul@macbook ~]$ curl -XGET '
http://logs1:9200/logs/message/_search?q=apegjaegpojaeg'

{"took":16,"timed_out":false,"_shards":{"total":13,"successful":13,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}

[paul@macbook ~]$ curl -XGET '
http://logs1:9200/logs/message/_search?q=open'
{"took":22,"timed_out":false,"_shards":{"total":13,"successful":13,"failed":0},"hits":{"total":94769,"max_score":1.1631737,"hits":[
<< HITS REMOVED FOR PRIVACY >>}}]}

Lukas_Vlcek1 · April 16, 2012, 1:47pm

The cache is populated with the first query (so once a day in your case
should be enough unless the segment merge is executed for other reasons)
but I think the cache can be also invalidated if there is not enough memory
for the elasticsearch process to execute so make sure elasticsearch has
enough RAM to operate.

There is a chart that shows exactly this (filed and filter cache size vs
available mem) in bigdesk 1.0.0 (i.e. for elasticsearch 0.18.x). I am
working to bring this into bigdesk 2.0.0 soon as well. Meanwhile you can
simply check the nodes stats response for these data.

Regards,
Lukas

On Mon, Apr 16, 2012 at 3:28 PM, PabloEskimo paul.maddox@gmail.com wrote:

You read it exactly right.
I had been toying with the idea of triggering a scheduled query every hour
or so.

If I'm rotating the indices at 00:00 then am I correct in thinking I only
need to warm those caches once per day after the rotation is complete?

Thanks

On Monday, 16 April 2012 14:02:10 UTC+1, Lukáš Vlček wrote:

Hi,

If I read it correctly, you are creating a new indices every day (and
probably archiving/deleting old indices as well). This is probably result
of Field and Filter caches invalidation which is happening due to Lucene
segment merges. These filters are populated with the first query. If you
need to have your caches warmed upfront you can setup some background job
to execute some queries (typically, you can take some long running queries
from your user logs).

Regards,
Lukas

On Mon, Apr 16, 2012 at 11:59 AM, PabloEskimo paul.maddox@gmail.comwrote:

Hi,

I have a strange problem with Elasticsearch.
Elasticsearch is setup with number of indexes rotated on a daily basis
(eg: 20120416-logs) for collecting logs from a large number of servers
(~2GB/day currently).
On top of that I've setup a 'logs' alias which allows me to query all of
the indexes.

It seems that if I don't make a query on Elasticsearch for a few days,
when I do the first one will always take a very long time / timeout.
Subsequent queries run fine.

The example below shows me searching for a random string that I know
won't appear in any of the indexes.
It doesn't seem to make any difference whether the string exists in the
indexes or not - or if it changes between searches.
It just seems that the first query is always slow.

Is there some batch process I can run on a regular basis to stop this
happening? Am I doing something wrong?

[paul@macbook ~]$ curl -XGET 'http://logs1:9200/logs/**
message/_search?q=**apegjaegpojaeghttp://logs1:9200/logs/message/_search?q=apegjaegpojaeg
'
{"took":7387,"timed_out":false,"_shards":{"total":13,"
successful":13,"failed":0},"hits":{"total":0,"max_score":
null,"hits":}}

[paul@macbook ~]$ curl -XGET 'http://logs1:9200/logs/**
message/_search?q=apegjaegpojaeghttp://logs1:9200/logs/message/_search?q=apegjaegpojaeg
'
{"took":16,"timed_out":false,"_shards":{"total":13,"**
successful":13,"failed":0},"hits":{"total":0,"max_score":
null,"hits":}}

[paul@macbook ~]$ curl -XGET 'http://logs1:9200/logs/**
message/search?q=open http://logs1:9200/logs/message/_search?q=open'
{"took":22,"timed_out":false,"_shards":{"total":13,"
successful":13,"failed":0},"**hits":{"total":94769,"max**score":1.1631737,"hits":[
<< HITS REMOVED FOR PRIVACY >>}}]}

kimchy · April 17, 2012, 1:18pm

Since you don't sort / facet on any field, and not filtering, the only
cache in question here is the OS file system cache. And some
initializations for first time index searches, so yea, issuing search
requests periodically will help.

On Mon, Apr 16, 2012 at 4:47 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

The cache is populated with the first query (so once a day in your case
should be enough unless the segment merge is executed for other reasons)
but I think the cache can be also invalidated if there is not enough memory
for the elasticsearch process to execute so make sure elasticsearch has
enough RAM to operate.

There is a chart that shows exactly this (filed and filter cache size vs
available mem) in bigdesk 1.0.0 (i.e. for elasticsearch 0.18.x). I am
working to bring this into bigdesk 2.0.0 soon as well. Meanwhile you can
simply check the nodes stats response for these data.

Regards,
Lukas

On Mon, Apr 16, 2012 at 3:28 PM, PabloEskimo paul.maddox@gmail.comwrote:

You read it exactly right.
I had been toying with the idea of triggering a scheduled query every
hour or so.

If I'm rotating the indices at 00:00 then am I correct in thinking I only
need to warm those caches once per day after the rotation is complete?

Thanks

On Monday, 16 April 2012 14:02:10 UTC+1, Lukáš Vlček wrote:

Hi,

If I read it correctly, you are creating a new indices every day (and
probably archiving/deleting old indices as well). This is probably result
of Field and Filter caches invalidation which is happening due to Lucene
segment merges. These filters are populated with the first query. If you
need to have your caches warmed upfront you can setup some background job
to execute some queries (typically, you can take some long running queries
from your user logs).

Regards,
Lukas

On Mon, Apr 16, 2012 at 11:59 AM, PabloEskimo paul.maddox@gmail.comwrote:

Hi,

I have a strange problem with Elasticsearch.
Elasticsearch is setup with number of indexes rotated on a daily basis
(eg: 20120416-logs) for collecting logs from a large number of servers
(~2GB/day currently).
On top of that I've setup a 'logs' alias which allows me to query all
of the indexes.

It seems that if I don't make a query on Elasticsearch for a few days,
when I do the first one will always take a very long time / timeout.
Subsequent queries run fine.

The example below shows me searching for a random string that I know
won't appear in any of the indexes.
It doesn't seem to make any difference whether the string exists in the
indexes or not - or if it changes between searches.
It just seems that the first query is always slow.

Is there some batch process I can run on a regular basis to stop this
happening? Am I doing something wrong?

[paul@macbook ~]$ curl -XGET 'http://logs1:9200/logs/**
message/_search?q=**apegjaegpojaeghttp://logs1:9200/logs/message/_search?q=apegjaegpojaeg
'
{"took":7387,"timed_out":false,"_shards":{"total":13,"
successful":13,"failed":0},"hits":{"total":0,"max_score":
null,"hits":}}

[paul@macbook ~]$ curl -XGET 'http://logs1:9200/logs/**
message/_search?q=apegjaegpojaeghttp://logs1:9200/logs/message/_search?q=apegjaegpojaeg
'
{"took":16,"timed_out":false,"_shards":{"total":13,"**
successful":13,"failed":0},"hits":{"total":0,"max_score":
null,"hits":}}

[paul@macbook ~]$ curl -XGET 'http://logs1:9200/logs/**
message/search?q=open http://logs1:9200/logs/message/_search?q=open'
{"took":22,"timed_out":false,"_shards":{"total":13,"
successful":13,"failed":0},"**hits":{"total":94769,"max**score":1.1631737,"hits":[
<< HITS REMOVED FOR PRIVACY >>}}]}

Topic		Replies	Views
First query is slow after every short period of time Elasticsearch	1	535	September 24, 2020
Elsaticsearch everyday, first search is slow and rest are fine Elasticsearch	6	505	November 15, 2019
First search Request on Elasticsearch is slow Elasticsearch	4	441	April 11, 2023
Slow first request on an index after a short amount of time Elasticsearch	6	9901	March 13, 2020
Slow response times query first couple of times Elasticsearch	3	566	October 22, 2021

First search for a while always slow

Related topics