Very slow query on index cluster with 34B documents


(Slava G ) #1

Hi,
I have cluster that has 2 indexes that has 20B documents and 14B documents each.
85% of of the documents are from one type and other 15% are all other types. When I'm doing any query on the types form 15% the query is running very fast , in terms of milliseconds. But if I run query on the type that has 85% of the documents it's running for more than 2 minutes.

The cluster is 10 nodes, each node is EC2 instance D2.XLarge that has 4 cores cpu, 32gb ram, 3 Hard drive 2TB each and running CentOS with ES 1.7.1

Each index has 20 shards, average shard size is : 223.1GB. and average number of documents on each shard is 977,619,827.

During the query CPU is used on 97% and disk reads are :

The query is this :

{
  "from": 0,
  "size": 50,
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "taskid": "7746bd42-c92b-4ab7-9a48-ebc399990a4c"
              }
            },
            {
              "term": {
                "userid": "..."
              }
            },
            {
              "term": {
                "account": "..."
              }
            },
            {
              "term": {
                "source": "mail"
              }
            },
            {
              "term": {
                "snapdate": 20160124142833
              }
            },
            {
              "term": {
                "folders": "02e34097b17659d74bccca3f18afa7579b8ce37b008d4809fa55fe31ffc1d65c"
              }
            }
          ],
          "_cache": true
        }
      }
    }
  },
  "explain": false,
  "fields": "*",
  "sort": [
    {
      "foldersname": {
        "order": "desc"
      }
    }
  ]
}

Is there any way to speed up queries on the dominant type (one that has 85% of documents).

Thanks


(Christian Dahlqvist) #2

How many shards do you have? What is the average shard size? What does CPU and disk I/O look like when you run the long running query?


(David Pilato) #3

And also which kind of query are you running?


(Slava G ) #4

20 shards each index , I updated the question with missing information.

Thanks


(Slava G ) #5

Hi,
I updated the question with missing information, please see.

Thanks.


(Christian Dahlqvist) #6

That is indeed very large shards and a lot of documents to read from disk and sift through per shard. We generally recommend keeping the shard size below 50GB to make recovery and transfer of shards easier, but query performance can also decrease when the shard gets too large. That is however not going to help you right away as it would require reindexing.

How many documents does one of these queries on average match? How many concurrent queries are you running? What does heap usage look like? Do you see anything in the logs, e.g. about long garbage collection? Can you also ensure you have got swap disabled?


(Slava G ) #7

This specific query match 11 documents, it could vary from 0 to 50 in average.
As for concurrent query - 90% of the queries are running alone , no other queries are running and sometimes 2 very rear 3-4. Swap is disabled on every node.
As for heap , it's configured 25% for index buffer, 20% for fields and 20% filter cache.
I don't see anything special in the logs.

But, other queries on other types are running pretty fast, even with this shards size, so is there any relation between type size ?

Thanks.


(system) #8