Hi All,
I am trying to improve my ES query performance. The goal is to get response
times for 3 related queries under a second!. In my test i have seen 90th
percentile response time (took time) for combined 3 queries to be ~1.8
seconds. Here are the details:
Cluster:
- 5 Machines, 5 Shards, Currently on m3.2xlarge. (Had started with less
powerful boxes and went up one by one, started from m3.large) - 4 indexes.
- one index with ~90 million recrods (total *19.3 GB *on all shards
.) - one with ~24 million (total 6GB on all shards.)
- Other two are in 780K and 340K ( total 160MB and 190MB)
- one index with ~90 million recrods (total *19.3 GB *on all shards
- All fields in the larger indexes are integers.
- Record size is small-ish.
- indexes are compressed.
- I have given 15 GB to ES instances.
- Indexes are stored on EBS volumes. Each instance has 250GB volume
with it. (Keeping SSDs as last resort)
The indexes are not changing (for now, in future they would change once a
day). So no indexing is taking place while we query. Therefore, I have
tried things like reducing number of segments in the two larger indexes.
That helped to a point.
Querying Technique:
- use python ES client.
- 3 small instance forking 10 threads at the same time.
- Each thread would fire 3 queries before reporting a time.
- At time there would be ~100 concurrent queries on the machines. settles
around ~50-60. - I take 'took' time from ES response to measure times.
- I discard 100 records before measuring times.
- A total of 5000 unique users are used for which 3 ES queries would be
fired. A total of 4900 users' times are measured.
Observations:
- RAM is never under stress. Well below 15 GB allotted.
- CPU comes under strain, goes upto 85-95 region on all instances during
the tests.
Queries:
1. On an index with ~24 Million records:
res = es.search( index="index1",
body={"query":{"bool":{"must":[{"term":{"cid":value}}]}}}, sort=[
"source:desc", "cdate:desc" ], size=100, fields=["wiid"], _source="true")
i parse results of these queries to get certain fields out and pass on to
the 2nd query. Lets call those fields as: q1.field1 and q2.field2
2. On an index with ~90 million records:
res1 = es.search(index="index2",
body={"query":{"filtered":{"filter":{"bool":{"must":{"terms":{"col_a":
q1.field1}},"must_not":{"terms":{"col_b":q1.field1
}}}}}},"aggs":{"i2B":{"terms":{"field":"col_b", "size": 1000
,"shard_size":10000, "order" : { "mss.sum":"desc"}
},"aggs":{"mss":{"stats":{"script":"ca = _source.col_a;
index=wiids.indexOf(ca); sval=0; if(index!=-1) sval=svalues.get(index);
else sval=-1; return _source.*col_x**sval; ","params":{"wiids":q1.field1
,"svalues":q1.field2}}},"simSum":{"stats":{"script":"return _source.
col_x "}}}}}}, size=1)
- it uses filtered query.
- uses 2 aggregations
- uses script in aggregation.
- use shard_size
Again, i parse results and get a filed out. Lets call that field as:
q2.field1
- On an index with ~340K records:
res2 = es.search(index="index3", body= { "query" : { "filtered" : {
"query":{ "terms":{ "wiid":q2.field1 } }, "filter" : { "bool" : {
"must" : [ { "range" : {"isInRange": { "gte" : 10 } } } , { "term" : {
"isCondA" : "false" } } , { "term" : { "isCondB" : "false"} }, { "term" : {
"isCondC" : "false" } } ] } } } } } , size=1000)
Please let me know if any other information would help you help me.
Query 2 above is doing aggregations and using a custom script. This is
where times reach few seconds, like 2-3 seconds or even 4+ seconds at
times.
I can move to a high end CPU machine and may be the performance would
improve. Wanted to check if there is anything else that i am missing.
Thanks!
Ravi
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9148ddfb-1a72-49db-b716-f2f9405392e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.