Hi
I am doing a performance evaluation with 2 queries. The purpose is to figure out how many docs one node can process.
But Im not sure if I
m using the correct way to test. Also not sure if the conclusion I got is correct. Because they seems too small.
Below are the hardware/software env I`m using.
CPU 8core
Memory 16G
OS centOS 7
elasticsearch 2.1
8G memory for ES heap
no swap
The requirements is that
all query response time less 3 seconds.
The map I use is like below:
"@timestamp": { "index": "not_analyzed", "type": "date","format": "epoch_second"},
"ROUTER" : { "type" : "integer", "index" : "not_analyzed"},
"IN_IFACE" : { "type" : "integer", "index" : "not_analyzed"},
"OUT_IFACE" : { "type" : "integer", "index" : "not_analyzed"},
"SRC_MAC" : { "type" : "long", "index" : "not_analyzed"},
"DST_MAC" : { "type" : "long", "index" : "not_analyzed"},
"IP_PAIRE" : { "type" : "integer", "index" : "not_analyzed"},
"SRC_IP" : { "type" : "ip", "index" : "not_analyzed"},
"DST_IP" : { "type" : "ip", "index" : "not_analyzed"},
"SRC_PORT" : { "type" : "integer", "index" : "not_analyzed"},
"DST_PORT" : { "type" : "integer", "index" : "not_analyzed"},
"VLAN" : { "type" : "integer", "index" : "not_analyzed" },
"PROTOCOL" : { "type" : "string", "index" : "not_analyzed" },
"BYTES" : { "type" : "long", "index" : "not_analyzed" },
"PACKETS" : { "type" : "long", "index" : "not_analyzed" }
The query are:
curl -XPOST "127.0.0.1:9200/sflow_1454256000/sflow/_search?pretty" -d '
{
"size":0,
"query": {
"filtered":{
"filter":{
"range": { "@timestamp": {"gte":0,"lte":9999999999}}
}
}
},
"aggs":
{
"SRC_DST":
{
"terms": {"script": "[doc.SRC_IP.value, doc.DST_IP.value].join(\"-\")","size": 2,"shard_size":0, "order": {"sum_bits": "desc"}},
"aggs": { "sum_bits": { "sum": {"field": "BYTES"} } }
}
}
}'
curl -XPOST "127.0.0.1:9200/sflow_1454256000/sflow/_search?pretty" -d '
{
"size":0,
"query": {
"filtered":{
"filter":{
"range": { "@timestamp": { "gte":0,"lte":9999999999}}
}
}
},
"aggs":
{
"SRC_MAC":
{
"terms": {"field":"SRC_MAC", "size": 3,"shard_size":0, "order": {"sum_bits": "desc"}},
"aggs": { "sum_bits": { "sum": {"field": "BYTES"} } }
}
}
}'
Based on the offical docs. You need to find out the max number of docs for one shard. Then find out how many shards your computer can hold.
Below are my steps:
First, I use only one shard for one index in my node.
When the shard have 40 000 docs , response time is 3s.
Then I know that in my environment,
the limit for one shard is 40 000 docs if I want the response time within 3s.
Based on this, I use multiple shards to run those two queries. Each shard contains 40 000 docs. The result is :
When the number of shards is 2 (each shard have 400 000 docs), the response time is 3s.
Then I get the conclusion that,
the limit for my computer is 2 shards, each have 400 000 docs(totaly 800 000 docs). If I want the response time within 3s
So, Could you help to point out if I`m doing the test in a wrong way?
Also, I do not have much experience in using ES. So is it a reasonable result given my hardware/software environment and the 3s response requirement?
------------------update-------------
I actually did some other test too. I use 5 shards, totaly contains 900 000 docs. (Means 180 000 docs per shard). The response time actually better than the 2 shard, 800 000 docs way. So I think this proved that my previous conclusion is wrong. But I`m doing the test based on the offical docs advise. Can any one tell me why?
The offical documents link that I`m doing my test based on. https://www.elastic.co/guide/en/elasticsearch/guide/current/capacity-planning.html