hi,
im trying to figure out how to speed up my queries.. nothing is helping.. i
have a cluster of 3 nodes with 1 replica (2 data nodes and 1 non-data LB
node). each of the 2 data nodes are m1.xlarge on ec2 with 8GB of ram
mlocked on each of them. i am logging slow queries.. there are mainly 2
indices with 5 shards each. what are my options to resolve this?
here is an example of slow query log -
[2012-10-19 19:30:30,570][WARN ][index.search.slowlog.fetch] [Phimster,
Ellie] [
twitter][2] took[3s], took_millis[3010], search_type[QUERY_THEN_FETCH],
total_sh
ards[5],
source[{"size":100,"sort":{"score":{"order":"desc"},"_score":{}},"query
":{"query_string":{"fields":["text","product_categories"],"query":"SWAROVSKI
ELE
MENTS OR Swarovski Rhinestones OR Rhinestone Shapes OR 2300
Teardrop"}},"filter
":{"terms":{"brand_id":["14956545"],"minimum_match":1}}}extra_source[],
most of our queries are filtered on a field in the document. but we are not
using routing feature yet. the problem with using routing feature now is
what happens to data that is already indexed on other shards?
migrating/moving will be a huge deal i think.
On Friday, October 19, 2012 3:43:51 PM UTC-4, T Vinod Gupta wrote:
hi,
im trying to figure out how to speed up my queries.. nothing is helping..
i have a cluster of 3 nodes with 1 replica (2 data nodes and 1 non-data LB
node). each of the 2 data nodes are m1.xlarge on ec2 with 8GB of ram
mlocked on each of them. i am logging slow queries.. there are mainly 2
indices with 5 shards each. what are my options to resolve this?
here is an example of slow query log -
[2012-10-19 19:30:30,570][WARN ][index.search.slowlog.fetch] [Phimster,
Ellie] [
twitter][2] took[3s], took_millis[3010], search_type[QUERY_THEN_FETCH],
total_sh
ards[5],
source[{"size":100,"sort":{"score":{"order":"desc"},"_score":{}},"query
":{"query_string":{"fields":["text","product_categories"],"query":"SWAROVSKI
ELE
MENTS OR Swarovski Rhinestones OR Rhinestone Shapes OR 2300
Teardrop"}},"filter
":{"terms":{"brand_id":["14956545"],"minimum_match":1}}}extra_source,
most of our queries are filtered on a field in the document. but we are
not using routing feature yet. the problem with using routing feature now
is what happens to data that is already indexed on other shards?
migrating/moving will be a huge deal i think.
i was able to make some headway by using filtered queries (instead of query
with a filter). there is a subtle difference between the two (we dont use
facets) but i believe big difference in perf.
to answer your questions - indices are about 50GB total, Xmx/Xms is 8GB.
this is in the stable state. refresh interval is set to 60 sec. bigdesk
says about 50% of allocated heap is used, rest is free. threads are around
125 with peak at 160.
regarding the text query, we really want best phrase match.. but if thats
not possible, match on the words inside the phrases. that part is probably
not fully correct.
On Friday, October 19, 2012 3:43:51 PM UTC-4, T Vinod Gupta wrote:
hi,
im trying to figure out how to speed up my queries.. nothing is helping..
i have a cluster of 3 nodes with 1 replica (2 data nodes and 1 non-data LB
node). each of the 2 data nodes are m1.xlarge on ec2 with 8GB of ram
mlocked on each of them. i am logging slow queries.. there are mainly 2
indices with 5 shards each. what are my options to resolve this?
here is an example of slow query log -
[2012-10-19 19:30:30,570][WARN ][index.search.slowlog.fetch] [Phimster,
Ellie] [
twitter][2] took[3s], took_millis[3010], search_type[QUERY_THEN_FETCH],
total_sh
ards[5], source[{"size":100,"sort":{"score":{"order":"desc"},"_
score":{}},"query
":{"query_string":{"fields":["**text","product_categories"],"**query":"SWAROVSKI
ELE
MENTS OR Swarovski Rhinestones OR Rhinestone Shapes OR 2300
Teardrop"}},"filter
":{"terms":{"brand_id":["14956545"],"minimum_match":1}}
}extra_source,
most of our queries are filtered on a field in the document. but we are
not using routing feature yet. the problem with using routing feature now
is what happens to data that is already indexed on other shards?
migrating/moving will be a huge deal i think.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.