Hi,
I have cluster that has 2 indexes that has 20B documents and 14B documents each.
85% of of the documents are from one type and other 15% are all other types. When I'm doing any query on the types form 15% the query is running very fast , in terms of milliseconds. But if I run query on the type that has 85% of the documents it's running for more than 2 minutes.
The cluster is 10 nodes, each node is EC2 instance D2.XLarge that has 4 cores cpu, 32gb ram, 3 Hard drive 2TB each and running CentOS with ES 1.7.1
Each index has 20 shards, average shard size is : 223.1GB. and average number of documents on each shard is 977,619,827.
During the query CPU is used on 97% and disk reads are :
The query is this :
{
"from": 0,
"size": 50,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"taskid": "7746bd42-c92b-4ab7-9a48-ebc399990a4c"
}
},
{
"term": {
"userid": "..."
}
},
{
"term": {
"account": "..."
}
},
{
"term": {
"source": "mail"
}
},
{
"term": {
"snapdate": 20160124142833
}
},
{
"term": {
"folders": "02e34097b17659d74bccca3f18afa7579b8ce37b008d4809fa55fe31ffc1d65c"
}
}
],
"_cache": true
}
}
}
},
"explain": false,
"fields": "*",
"sort": [
{
"foldersname": {
"order": "desc"
}
}
]
}
Is there any way to speed up queries on the dominant type (one that has 85% of documents).
Thanks