Good day,
I have ES 1.5.2, deployed over 15 nodes. Each node has 40Gb RAM, two 2.5Ghz CPU cores and a single SSD drive.
I want to store some "sensor data" for each machine m and its sensors s. In each doc I capture machine_id (m), sensor id and sub_ids (s, s1, s2), timestamp (ts), and 4 values (v1..v4).
I've created 24 indexes with 15 shards each using the following mapping:
{ "sample": {
"_all": {"enabled": False},
"properties": {
"m": {
"type": "string",
"index": "not_analyzed",
"doc_values": True,
},
"s": {
"type": "string",
"index": "not_analyzed",
"doc_values": True,
},
"ts": {
"type": "date",
"doc_values": True,
},
"ss1": {
"type": "string",
"index": "not_analyzed",
"doc_values": True,
},
"ss1": {
"type": "string",
"index": "not_analyzed",
"doc_values": True,
},
"v1": {
"type": "long",
"doc_values": True,
"index": "no",
},
"v2": {
"type": "long",
"doc_values": True,
"index": "no",
},
"v3": {
"type": "long",
"doc_values": True,
"index": "no",
},
"v4": {
"type": "long",
"doc_values": True,
"index": "no",
}
}
}
}
I've indexed 600 million docs into each index (using routing by m field). My indexing speed goes up to 200,000 docs/sec which is very satisfying, but the aggregation performance is very slow.
I've created a single alias for all of these indices. Now consider this simple aggregation:
curl http://example.com:9200/myalias/_search?search_type=count -d {
"aggs": {
"1_min_date": {
"min": {
"field": "ts"
}
},
"2_max_date": {
"max": {
"field": "ts"
}
}
}
}
I run it against my idle cluster and it took about 40 seconds to execute. During execution, about half of the I see all of the nodes CPU is busy 100%, then only a single node is 100% CPU busy for the rest of the time probably trying to aggregate the results.
Why is it so slow? ts
field is indexed, so from my understanding, finding minimal value for it is a matter of O(1) operation on each segment of every shard. I.e. the complexity should be O(number_of_shards). I have 360 shards, to its 360 lookups and then sorting an array with 360 members.
What am I doing wrong?
Thank you.