ES slow query problem


(T Vinod Gupta) #1

hi,
im trying to figure out how to speed up my queries.. nothing is helping.. i
have a cluster of 3 nodes with 1 replica (2 data nodes and 1 non-data LB
node). each of the 2 data nodes are m1.xlarge on ec2 with 8GB of ram
mlocked on each of them. i am logging slow queries.. there are mainly 2
indices with 5 shards each. what are my options to resolve this?

here is an example of slow query log -
[2012-10-19 19:30:30,570][WARN ][index.search.slowlog.fetch] [Phimster,
Ellie] [
twitter][2] took[3s], took_millis[3010], search_type[QUERY_THEN_FETCH],
total_sh
ards[5],
source[{"size":100,"sort":{"score":{"order":"desc"},"_score":{}},"query
":{"query_string":{"fields":["text","product_categories"],"query":"SWAROVSKI
ELE
MENTS OR Swarovski Rhinestones OR Rhinestone Shapes OR 2300
Teardrop"}},"filter
":{"terms":{"brand_id":["14956545"],"minimum_match":1}}}extra_source[],

most of our queries are filtered on a field in the document. but we are not
using routing feature yet. the problem with using routing feature now is
what happens to data that is already indexed on other shards?
migrating/moving will be a huge deal i think.

any advice would be helpful.

thanks

--


(Otis Gospodnetić) #2

Hi,

Questions:

  • how big are your indices?
  • what Xmx are you using?
  • is there disk IO?
  • is this right after start or after caches have been warmed?
  • if the disk is constantly being updated, try increasing refresh interval
  • what does your system/ES monitoring tool show you?

Do you really want .... OR Swarovski Rhinestones OR Rhinestone Shapes ...

or do you actually want ... OR "Swarovski Rhinestones" OR "Rhinestone
Shapes"...

?

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Friday, October 19, 2012 3:43:51 PM UTC-4, T Vinod Gupta wrote:

hi,
im trying to figure out how to speed up my queries.. nothing is helping..
i have a cluster of 3 nodes with 1 replica (2 data nodes and 1 non-data LB
node). each of the 2 data nodes are m1.xlarge on ec2 with 8GB of ram
mlocked on each of them. i am logging slow queries.. there are mainly 2
indices with 5 shards each. what are my options to resolve this?

here is an example of slow query log -
[2012-10-19 19:30:30,570][WARN ][index.search.slowlog.fetch] [Phimster,
Ellie] [
twitter][2] took[3s], took_millis[3010], search_type[QUERY_THEN_FETCH],
total_sh
ards[5],
source[{"size":100,"sort":{"score":{"order":"desc"},"_score":{}},"query
":{"query_string":{"fields":["text","product_categories"],"query":"SWAROVSKI
ELE
MENTS OR Swarovski Rhinestones OR Rhinestone Shapes OR 2300
Teardrop"}},"filter
":{"terms":{"brand_id":["14956545"],"minimum_match":1}}}extra_source[],

most of our queries are filtered on a field in the document. but we are
not using routing feature yet. the problem with using routing feature now
is what happens to data that is already indexed on other shards?
migrating/moving will be a huge deal i think.

any advice would be helpful.

thanks

--


(T Vinod Gupta) #3

i was able to make some headway by using filtered queries (instead of query
with a filter). there is a subtle difference between the two (we dont use
facets) but i believe big difference in perf.

to answer your questions - indices are about 50GB total, Xmx/Xms is 8GB.
this is in the stable state. refresh interval is set to 60 sec. bigdesk
says about 50% of allocated heap is used, rest is free. threads are around
125 with peak at 160.
regarding the text query, we really want best phrase match.. but if thats
not possible, match on the words inside the phrases. that part is probably
not fully correct.

thanks

On Fri, Oct 19, 2012 at 8:01 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi,

Questions:

  • how big are your indices?
  • what Xmx are you using?
  • is there disk IO?
  • is this right after start or after caches have been warmed?
  • if the disk is constantly being updated, try increasing refresh interval
  • what does your system/ES monitoring tool show you?

Do you really want .... OR Swarovski Rhinestones OR Rhinestone Shapes ...

or do you actually want ... OR "Swarovski Rhinestones" OR "Rhinestone
Shapes"...

?

Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Friday, October 19, 2012 3:43:51 PM UTC-4, T Vinod Gupta wrote:

hi,
im trying to figure out how to speed up my queries.. nothing is helping..
i have a cluster of 3 nodes with 1 replica (2 data nodes and 1 non-data LB
node). each of the 2 data nodes are m1.xlarge on ec2 with 8GB of ram
mlocked on each of them. i am logging slow queries.. there are mainly 2
indices with 5 shards each. what are my options to resolve this?

here is an example of slow query log -
[2012-10-19 19:30:30,570][WARN ][index.search.slowlog.fetch] [Phimster,
Ellie] [
twitter][2] took[3s], took_millis[3010], search_type[QUERY_THEN_FETCH],
total_sh
ards[5], source[{"size":100,"sort":{"score":{"order":"desc"},"_
score":{}},"query
":{"query_string":{"fields":["**text","product_categories"],"**query":"SWAROVSKI
ELE
MENTS OR Swarovski Rhinestones OR Rhinestone Shapes OR 2300
Teardrop"}},"filter
":{"terms":{"brand_id":["14956545"],"minimum_match":1}}
}extra_source[],

most of our queries are filtered on a field in the document. but we are
not using routing feature yet. the problem with using routing feature now
is what happens to data that is already indexed on other shards?
migrating/moving will be a huge deal i think.

any advice would be helpful.

thanks

--

--


(system) #4