Random very slow queries


(Felipe Hummel) #1

Hi, we have a ~57GB index (with index compression) with ~36 million documents. 4 EC2 machines (7.4GB RAM 2-cores) with 2 primary shards each (no replicas).

We are not in production yet, so the load is basically developers and other employees playing around with the application.

Yet, we are seeing some very slow queries from time to time. Normally a query takes from 100ms up to 1100ms max! But sometimes one innocent query takes up 20 seconds, 30 seconds. Today we saw one taking 55 seconds. But right after, when we do the query again it takes the usual 100-1100ms.

The query general format is embedded below, note that we're not retrieving _source (fields: [])

{
"size" : 250,
"query" : {
"custom_score" : {
"query" : {
"query_string" : {
"query" : "...",
"fields" : [ "myField^2.0", "myField2" ],
"default_operator" : "and"
}
},
"script" : "a = doc['created_at'].value) * param; _score / ( a*log(a) )",
"params" : {
"param1" : 123123123
}
}
},
"filter" : {
"bool" : {
"must" : {
"range" : {
"created_at" : {
"from" : null,
"to" : "2012-10-17T20:25:10.228Z",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"fields" : [ ]
}

The machines have around 4GB being used (ES memory is set: -Xms4G -Xmx4G) and the rest in OS cache (swapping is disabled).

Almost all ES config is default (merge policies for example). We're on ES version 0.19.8.

We know it may be some sporadic thing like merging big segments or GC'ing. But 50 seconds is way to strange.

Any ideas?

Thanks

Felipe Hummel

--


(simonw-2) #2

hey, this seems very much like a DataCache loading problem. The script you
are using requires all values for "created_at" to be loaded into memory.
This happens by un-inverting the field once the first query hits an index
segment. Usually this is fast but if you occasionally hit a big segment
merge in the background large segments need to be loaded and that can
easily take that long. We refer to this as segment warming, in 0.20 (likely
the next release coming soon) we have a warmer API that allows to pre-warm
new segments so that you won't suffer from loading FieldDataCaches. If you
can wait until 0.20 this should get resolved!

simon

On Friday, October 19, 2012 10:12:23 PM UTC+2, Felipe Hummel wrote:

Hi, we have a ~57GB index (with index compression) with ~36 million documents. 4 EC2 machines (7.4GB RAM 2-cores) with 2 primary shards each (no replicas).

We are not in production yet, so the load is basically developers and other employees playing around with the application.

Yet, we are seeing some very slow queries from time to time. Normally a query takes from 100ms up to 1100ms max! But sometimes one innocent query takes up 20 seconds, 30 seconds. Today we saw one taking 55 seconds. But right after, when we do the query again it takes the usual 100-1100ms.

The query general format is embedded below, note that we're not retrieving _source (fields: [])

{
"size" : 250,
"query" : {
"custom_score" : {
"query" : {
"query_string" : {
"query" : "...",
"fields" : [ "myField^2.0", "myField2" ],
"default_operator" : "and"
}
},
"script" : "a = doc['created_at'].value) * param; _score / ( a*log(a) )",
"params" : {
"param1" : 123123123
}
}
},
"filter" : {
"bool" : {
"must" : {
"range" : {
"created_at" : {
"from" : null,
"to" : "2012-10-17T20:25:10.228Z",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"fields" : [ ]
}

The machines have around 4GB being used (ES memory is set: -Xms4G -Xmx4G) and the rest in OS cache (swapping is disabled).

Almost all ES config is default (merge policies for example). We're on ES version 0.19.8.

We know it may be some sporadic thing like merging big segments or GC'ing. But 50 seconds is way to strange.

Any ideas?

Thanks

Felipe Hummel

--


(Felipe Hummel) #3

We actually started implementing the calls to Index Warmer API earlier
today, but realized it was only for 0.20.
Coming soon like next weeks or next months? :slight_smile:
Is it reasonable to use the elasticsearch master? Would be upgrading from
0.19.8.

Thanks!

Felipe Hummel

On Fri, Oct 19, 2012 at 4:49 PM, simonw
simon.willnauer@elasticsearch.comwrote:

hey, this seems very much like a DataCache loading problem. The script you
are using requires all values for "created_at" to be loaded into memory.
This happens by un-inverting the field once the first query hits an index
segment. Usually this is fast but if you occasionally hit a big segment
merge in the background large segments need to be loaded and that can
easily take that long. We refer to this as segment warming, in 0.20 (likely
the next release coming soon) we have a warmer API that allows to pre-warm
new segments so that you won't suffer from loading FieldDataCaches. If you
can wait until 0.20 this should get resolved!

simon

On Friday, October 19, 2012 10:12:23 PM UTC+2, Felipe Hummel wrote:

Hi, we have a ~57GB index (with index compression) with ~36 million documents. 4 EC2 machines (7.4GB RAM 2-cores) with 2 primary shards each (no replicas).

We are not in production yet, so the load is basically developers and other employees playing around with the application.

Yet, we are seeing some very slow queries from time to time. Normally a query takes from 100ms up to 1100ms max! But sometimes one innocent query takes up 20 seconds, 30 seconds. Today we saw one taking 55 seconds. But right after, when we do the query again it takes the usual 100-1100ms.

The query general format is embedded below, note that we're not retrieving _source (fields: [])

{
"size" : 250,
"query" : {
"custom_score" : {
"query" : {
"query_string" : {
"query" : "...",
"fields" : [ "myField^2.0", "myField2" ],

      "default_operator" : "and"
    }
  },
  "script" : "a = doc['created_at'].value) * param;  _score / ( a*log(a) )",
  "params" : {
    "param1" : 123123123
  }
}

},
"filter" : {
"bool" : {
"must" : {
"range" : {
"created_at" : {
"from" : null,
"to" : "2012-10-17T20:25:10.228Z",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"fields" : [ ]
}

The machines have around 4GB being used (ES memory is set: -Xms4G -Xmx4G) and the rest in OS cache (swapping is disabled).

Almost all ES config is default (merge policies for example). We're on ES version 0.19.8.

We know it may be some sporadic thing like merging big segments or GC'ing. But 50 seconds is way to strange.

Any ideas?

Thanks

Felipe Hummel

--

--


(simonw-2) #4

hey,

On Friday, October 19, 2012 10:59:03 PM UTC+2, Felipe Hummel wrote:

We actually started implementing the calls to Index Warmer API earlier
today, but realized it was only for 0.20.
Coming soon like next weeks or next months? :slight_smile:

its rather like weeks I'd say.

Is it reasonable to use the elasticsearch master? Would be upgrading from
0.19.8.

I mean it would be good to verify that warmer api helps you! so yeah go and
try!

simon

Thanks!

Felipe Hummel

On Fri, Oct 19, 2012 at 4:49 PM, simonw <simon.w...@elasticsearch.com<javascript:>

wrote:

hey, this seems very much like a DataCache loading problem. The script
you are using requires all values for "created_at" to be loaded into
memory. This happens by un-inverting the field once the first query hits an
index segment. Usually this is fast but if you occasionally hit a big
segment merge in the background large segments need to be loaded and that
can easily take that long. We refer to this as segment warming, in 0.20
(likely the next release coming soon) we have a warmer API that allows to
pre-warm new segments so that you won't suffer from loading
FieldDataCaches. If you can wait until 0.20 this should get resolved!

simon

On Friday, October 19, 2012 10:12:23 PM UTC+2, Felipe Hummel wrote:

Hi, we have a ~57GB index (with index compression) with ~36 million documents. 4 EC2 machines (7.4GB RAM 2-cores) with 2 primary shards each (no replicas).

We are not in production yet, so the load is basically developers and other employees playing around with the application.

Yet, we are seeing some very slow queries from time to time. Normally a query takes from 100ms up to 1100ms max! But sometimes one innocent query takes up 20 seconds, 30 seconds. Today we saw one taking 55 seconds. But right after, when we do the query again it takes the usual 100-1100ms.

The query general format is embedded below, note that we're not retrieving _source (fields: [])

{
"size" : 250,
"query" : {
"custom_score" : {
"query" : {
"query_string" : {
"query" : "...",
"fields" : [ "myField^2.0", "myField2" ],

      "default_operator" : "and"
    }
  },
  "script" : "a = doc['created_at'].value) * param;  _score / ( a*log(a) )",
  "params" : {
    "param1" : 123123123
  }
}

},
"filter" : {
"bool" : {
"must" : {
"range" : {
"created_at" : {
"from" : null,
"to" : "2012-10-17T20:25:10.228Z",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"fields" : [ ]
}

The machines have around 4GB being used (ES memory is set: -Xms4G -Xmx4G) and the rest in OS cache (swapping is disabled).

Almost all ES config is default (merge policies for example). We're on ES version 0.19.8.

We know it may be some sporadic thing like merging big segments or GC'ing. But 50 seconds is way to strange.

Any ideas?

Thanks

Felipe Hummel

--

--


(Onilton Maciel) #5

I should add that it seems that most of the query time comes from fetch
(not from query.)

The query takes like 2s and the fetch takes like 56s.

Through slowquery log we noticed a node in cluster that seems to always
have the slowest time, so we will probably be replacing it.

But anyway since we're not retrieving _source (fields:[]), do you have any
idea of what ES could still be "fetching" from the index?

PS: I work in the same place with Felipe Hummel

On Sat, Oct 20, 2012 at 6:24 AM, simonw
simon.willnauer@elasticsearch.comwrote:

hey,

On Friday, October 19, 2012 10:59:03 PM UTC+2, Felipe Hummel wrote:

We actually started implementing the calls to Index Warmer API earlier
today, but realized it was only for 0.20.
Coming soon like next weeks or next months? :slight_smile:

its rather like weeks I'd say.

Is it reasonable to use the elasticsearch master? Would be upgrading from
0.19.8.

I mean it would be good to verify that warmer api helps you! so yeah go
and try!

simon

Thanks!

Felipe Hummel

On Fri, Oct 19, 2012 at 4:49 PM, simonw <simon.w...@**elasticsearch.com>wrote:

hey, this seems very much like a DataCache loading problem. The script
you are using requires all values for "created_at" to be loaded into
memory. This happens by un-inverting the field once the first query hits an
index segment. Usually this is fast but if you occasionally hit a big
segment merge in the background large segments need to be loaded and that
can easily take that long. We refer to this as segment warming, in 0.20
(likely the next release coming soon) we have a warmer API that allows to
pre-warm new segments so that you won't suffer from loading
FieldDataCaches. If you can wait until 0.20 this should get resolved!

simon

On Friday, October 19, 2012 10:12:23 PM UTC+2, Felipe Hummel wrote:

Hi, we have a ~57GB index (with index compression) with ~36 million documents. 4 EC2 machines (7.4GB RAM 2-cores) with 2 primary shards each (no replicas).

We are not in production yet, so the load is basically developers and other employees playing around with the application.

Yet, we are seeing some very slow queries from time to time. Normally a query takes from 100ms up to 1100ms max! But sometimes one innocent query takes up 20 seconds, 30 seconds. Today we saw one taking 55 seconds. But right after, when we do the query again it takes the usual 100-1100ms.

The query general format is embedded below, note that we're not retrieving _source (fields: [])

{
"size" : 250,
"query" : {
"custom_score" : {
"query" : {
"query_string" : {
"query" : "...",
"fields" : [ "myField^2.0", "myField2" ],

      "default_operator" : "and"
    }
  },
  "script" : "a = doc['created_at'].value) * param;  _score / ( a*log(a) )",
  "params" : {
    "param1" : 123123123
  }
}

},
"filter" : {
"bool" : {
"must" : {
"range" : {
"created_at" : {
"from" : null,
"to" : "2012-10-17T20:25:10.228Z",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"fields" : [ ]
}

The machines have around 4GB being used (ES memory is set: -Xms4G -Xmx4G) and the rest in OS cache (swapping is disabled).

Almost all ES config is default (merge policies for example). We're on ES version 0.19.8.

We know it may be some sporadic thing like merging big segments or GC'ing. But 50 seconds is way to strange.

Any ideas?

Thanks

Felipe Hummel

--

--

--


(Shay Banon) #6

The fetch part returns additional information for the hit (even if you don't ask for _source), for example, loading the actual _id of the document (the "query" phase streams internal ids of the doc, the "Lucene ids"). Its strange that this is what actually takes most of the time to be honest, can you gist the slow log that shows it?

Also, Simon comments on the warmer API (which will affect the query phase, as you identified correctly) are still valid, and 0.20 will come out early next week.

On Oct 20, 2012, at 3:51 PM, Onilton Maciel oniltonmaciel@gmail.com wrote:

I should add that it seems that most of the query time comes from fetch (not from query.)

The query takes like 2s and the fetch takes like 56s.

Through slowquery log we noticed a node in cluster that seems to always have the slowest time, so we will probably be replacing it.

But anyway since we're not retrieving _source (fields:[]), do you have any idea of what ES could still be "fetching" from the index?

PS: I work in the same place with Felipe Hummel

On Sat, Oct 20, 2012 at 6:24 AM, simonw simon.willnauer@elasticsearch.com wrote:
hey,

On Friday, October 19, 2012 10:59:03 PM UTC+2, Felipe Hummel wrote:
We actually started implementing the calls to Index Warmer API earlier today, but realized it was only for 0.20.
Coming soon like next weeks or next months? :slight_smile:
its rather like weeks I'd say.

Is it reasonable to use the elasticsearch master? Would be upgrading from 0.19.8.

I mean it would be good to verify that warmer api helps you! so yeah go and try!

simon

Thanks!

Felipe Hummel

On Fri, Oct 19, 2012 at 4:49 PM, simonw simon.w...@elasticsearch.com wrote:
hey, this seems very much like a DataCache loading problem. The script you are using requires all values for "created_at" to be loaded into memory. This happens by un-inverting the field once the first query hits an index segment. Usually this is fast but if you occasionally hit a big segment merge in the background large segments need to be loaded and that can easily take that long. We refer to this as segment warming, in 0.20 (likely the next release coming soon) we have a warmer API that allows to pre-warm new segments so that you won't suffer from loading FieldDataCaches. If you can wait until 0.20 this should get resolved!

simon

On Friday, October 19, 2012 10:12:23 PM UTC+2, Felipe Hummel wrote:
Hi, we have a ~57GB index (with index compression) with ~36 million documents. 4 EC2 machines (7.4GB RAM 2-cores) with 2 primary shards each (no replicas).

We are not in production yet, so the load is basically developers and other employees playing around with the application.

Yet, we are seeing some very slow queries from time to time. Normally a query takes from 100ms up to 1100ms max! But sometimes one innocent query takes up 20 seconds, 30 seconds. Today we saw one taking 55 seconds. But right after, when we do the query again it takes the usual 100-1100ms.

The query general format is embedded below, note that we're not retrieving _source (fields: [])

{
"size" : 250,
"query" : {
"custom_score" : {
"query" : {
"query_string" : {
"query" : "...",
"fields" : [ "myField^2.0", "myField2" ],

      "default_operator" : "and"
    }
  },
  "script" : "a = doc['created_at'].value) * param;  _score / ( a*log(a) )",
  "params" : {
    "param1" : 123123123
  }
}

},
"filter" : {
"bool" : {
"must" : {
"range" : {
"created_at" : {
"from" : null,
"to" : "2012-10-17T20:25:10.228Z",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"fields" : [ ]
}

The machines have around 4GB being used (ES memory is set: -Xms4G -Xmx4G) and the rest in OS cache (swapping is disabled).
Almost all ES config is default (merge policies for example). We're on ES version 0.19.8.

We know it may be some sporadic thing like merging big segments or GC'ing. But 50 seconds is way to strange.

Any ideas?

Thanks

Felipe Hummel

--

--

--

--


(Felipe Hummel) #7

The relevant entries:
[2012-10-19 20:08:27,309][WARN ][index.search.slowlog.fetch] [Elastic
Search 4] [busk_prod_0310][5] took[54.3s], took_millis[54399],
search_type[QUERY_THEN_FETCH], total_shards[8], source[...], extra_source[],
[2012-10-19 20:08:30,365][WARN ][index.search.slowlog.fetch] [Elastic
Search 4] [busk_prod_0310][0] took[57.4s], took_millis[57454],
search_type[QUERY_THEN_FETCH], total_shards[8], source[...], extra_source[],

The sources were omitted for clarity, but the queries are pretty much the
same one in my first post. The difference is that we actually have two
filters on date fields (created_at and published) not only one.

For these two entries, the actual query_string is just a regular query,
nothing fancy. I actually just submitted again the same query and it took
ES 300 ms total.

We only turned on the slow log again yesterday. So in the coming days we
hope to have more information.

One last thing to clarify, the fetch time accounts for only local (same
node) operations or does it account for remote operations (retrieving
document information from other nodes)?

Thanks

Felipe Hummel

On Sat, Oct 20, 2012 at 6:29 PM, kimchy@gmail.com wrote:

The fetch part returns additional information for the hit (even if you
don't ask for _source), for example, loading the actual _id of the document
(the "query" phase streams internal ids of the doc, the "Lucene ids"). Its
strange that this is what actually takes most of the time to be honest, can
you gist the slow log that shows it?

Also, Simon comments on the warmer API (which will affect the query phase,
as you identified correctly) are still valid, and 0.20 will come out early
next week.

On Oct 20, 2012, at 3:51 PM, Onilton Maciel oniltonmaciel@gmail.com
wrote:

I should add that it seems that most of the query time comes from fetch
(not from query.)

The query takes like 2s and the fetch takes like 56s.

Through slowquery log we noticed a node in cluster that seems to always
have the slowest time, so we will probably be replacing it.

But anyway since we're not retrieving _source (fields:[]), do you have any
idea of what ES could still be "fetching" from the index?

PS: I work in the same place with Felipe Hummel

On Sat, Oct 20, 2012 at 6:24 AM, simonw <simon.willnauer@elasticsearch.com

wrote:

hey,

On Friday, October 19, 2012 10:59:03 PM UTC+2, Felipe Hummel wrote:

We actually started implementing the calls to Index Warmer API earlier
today, but realized it was only for 0.20.
Coming soon like next weeks or next months? :slight_smile:

its rather like weeks I'd say.

Is it reasonable to use the elasticsearch master? Would be upgrading
from 0.19.8.

I mean it would be good to verify that warmer api helps you! so yeah go
and try!

simon

Thanks!

Felipe Hummel

On Fri, Oct 19, 2012 at 4:49 PM, simonw <simon.w...@**elasticsearch.com>wrote:

hey, this seems very much like a DataCache loading problem. The script
you are using requires all values for "created_at" to be loaded into
memory. This happens by un-inverting the field once the first query hits an
index segment. Usually this is fast but if you occasionally hit a big
segment merge in the background large segments need to be loaded and that
can easily take that long. We refer to this as segment warming, in 0.20
(likely the next release coming soon) we have a warmer API that allows to
pre-warm new segments so that you won't suffer from loading
FieldDataCaches. If you can wait until 0.20 this should get resolved!

simon

On Friday, October 19, 2012 10:12:23 PM UTC+2, Felipe Hummel wrote:

Hi, we have a ~57GB index (with index compression) with ~36 million documents. 4 EC2 machines (7.4GB RAM 2-cores) with 2 primary shards each (no replicas).

We are not in production yet, so the load is basically developers and other employees playing around with the application.

Yet, we are seeing some very slow queries from time to time. Normally a query takes from 100ms up to 1100ms max! But sometimes one innocent query takes up 20 seconds, 30 seconds. Today we saw one taking 55 seconds. But right after, when we do the query again it takes the usual 100-1100ms.

The query general format is embedded below, note that we're not retrieving _source (fields: [])

{
"size" : 250,
"query" : {
"custom_score" : {
"query" : {
"query_string" : {
"query" : "...",
"fields" : [ "myField^2.0", "myField2" ],

      "default_operator" : "and"
    }
  },
  "script" : "a = doc['created_at'].value) * param;  _score / ( a*log(a) )",
  "params" : {
    "param1" : 123123123
  }
}

},
"filter" : {
"bool" : {
"must" : {
"range" : {
"created_at" : {
"from" : null,
"to" : "2012-10-17T20:25:10.228Z",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"fields" : [ ]
}

The machines have around 4GB being used (ES memory is set: -Xms4G -Xmx4G) and the rest in OS cache (swapping is disabled).

Almost all ES config is default (merge policies for example). We're on ES version 0.19.8.

We know it may be some sporadic thing like merging big segments or GC'ing. But 50 seconds is way to strange.

Any ideas?

Thanks

Felipe Hummel

--

--

--

--

--


(Ibrahim Nelman Lubis) #8

Hi,

We also see the same problem when some of the queries get very slow to
finish once a while. We are using version 0.19.6.
We currently use it for production purpose already. Hence any work around
for older version of elasticsearch to overcome this problem?

Thank you.
Nelman

On Sunday, October 21, 2012 10:00:26 AM UTC+8, Felipe Hummel wrote:

https://gist.github.com/3925323

The relevant entries:
[2012-10-19 20:08:27,309][WARN ][index.search.slowlog.fetch] [Elastic
Search 4] [busk_prod_0310][5] took[54.3s], took_millis[54399],
search_type[QUERY_THEN_FETCH], total_shards[8], source[...], extra_source[],
[2012-10-19 20:08:30,365][WARN ][index.search.slowlog.fetch] [Elastic
Search 4] [busk_prod_0310][0] took[57.4s], took_millis[57454],
search_type[QUERY_THEN_FETCH], total_shards[8], source[...], extra_source[],

The sources were omitted for clarity, but the queries are pretty much the
same one in my first post. The difference is that we actually have two
filters on date fields (created_at and published) not only one.

For these two entries, the actual query_string is just a regular query,
nothing fancy. I actually just submitted again the same query and it took
ES 300 ms total.

We only turned on the slow log again yesterday. So in the coming days we
hope to have more information.

One last thing to clarify, the fetch time accounts for only local (same
node) operations or does it account for remote operations (retrieving
document information from other nodes)?

Thanks

Felipe Hummel

On Sat, Oct 20, 2012 at 6:29 PM, <kim...@gmail.com <javascript:>> wrote:

The fetch part returns additional information for the hit (even if you
don't ask for _source), for example, loading the actual _id of the document
(the "query" phase streams internal ids of the doc, the "Lucene ids"). Its
strange that this is what actually takes most of the time to be honest, can
you gist the slow log that shows it?

Also, Simon comments on the warmer API (which will affect the query
phase, as you identified correctly) are still valid, and 0.20 will come out
early next week.

On Oct 20, 2012, at 3:51 PM, Onilton Maciel <onilto...@gmail.com<javascript:>>
wrote:

I should add that it seems that most of the query time comes from fetch
(not from query.)

The query takes like 2s and the fetch takes like 56s.

Through slowquery log we noticed a node in cluster that seems to always
have the slowest time, so we will probably be replacing it.

But anyway since we're not retrieving _source (fields:[]), do you have
any idea of what ES could still be "fetching" from the index?

PS: I work in the same place with Felipe Hummel

On Sat, Oct 20, 2012 at 6:24 AM, simonw <simon.w...@elasticsearch.com<javascript:>

wrote:

hey,

On Friday, October 19, 2012 10:59:03 PM UTC+2, Felipe Hummel wrote:

We actually started implementing the calls to Index Warmer API earlier
today, but realized it was only for 0.20.
Coming soon like next weeks or next months? :slight_smile:

its rather like weeks I'd say.

Is it reasonable to use the elasticsearch master? Would be upgrading
from 0.19.8.

I mean it would be good to verify that warmer api helps you! so yeah go
and try!

simon

Thanks!

Felipe Hummel

On Fri, Oct 19, 2012 at 4:49 PM, simonw <simon.w...@**elasticsearch.com

wrote:

hey, this seems very much like a DataCache loading problem. The script
you are using requires all values for "created_at" to be loaded into
memory. This happens by un-inverting the field once the first query hits an
index segment. Usually this is fast but if you occasionally hit a big
segment merge in the background large segments need to be loaded and that
can easily take that long. We refer to this as segment warming, in 0.20
(likely the next release coming soon) we have a warmer API that allows to
pre-warm new segments so that you won't suffer from loading
FieldDataCaches. If you can wait until 0.20 this should get resolved!

simon

On Friday, October 19, 2012 10:12:23 PM UTC+2, Felipe Hummel wrote:

Hi, we have a ~57GB index (with index compression) with ~36 million documents. 4 EC2 machines (7.4GB RAM 2-cores) with 2 primary shards each (no replicas).

We are not in production yet, so the load is basically developers and other employees playing around with the application.

Yet, we are seeing some very slow queries from time to time. Normally a query takes from 100ms up to 1100ms max! But sometimes one innocent query takes up 20 seconds, 30 seconds. Today we saw one taking 55 seconds. But right after, when we do the query again it takes the usual 100-1100ms.

The query general format is embedded below, note that we're not retrieving _source (fields: [])

{
"size" : 250,
"query" : {
"custom_score" : {
"query" : {
"query_string" : {
"query" : "...",
"fields" : [ "myField^2.0", "myField2" ],

      "default_operator" : "and"
    }
  },
  "script" : "a = doc['created_at'].value) * param;  _score / ( a*log(a) )",
  "params" : {
    "param1" : 123123123
  }
}

},
"filter" : {
"bool" : {
"must" : {
"range" : {
"created_at" : {
"from" : null,
"to" : "2012-10-17T20:25:10.228Z",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"fields" : [ ]
}

The machines have around 4GB being used (ES memory is set: -Xms4G -Xmx4G) and the rest in OS cache (swapping is disabled).

Almost all ES config is default (merge policies for example). We're on ES version 0.19.8.

We know it may be some sporadic thing like merging big segments or GC'ing. But 50 seconds is way to strange.

Any ideas?

Thanks

Felipe Hummel

--

--

--

--

--


(system) #9