Very slow query on index cluster with 34B documents

Slava_G · January 26, 2016, 8:39am

Hi,
I have cluster that has 2 indexes that has 20B documents and 14B documents each.
85% of of the documents are from one type and other 15% are all other types. When I'm doing any query on the types form 15% the query is running very fast , in terms of milliseconds. But if I run query on the type that has 85% of the documents it's running for more than 2 minutes.

The cluster is 10 nodes, each node is EC2 instance D2.XLarge that has 4 cores cpu, 32gb ram, 3 Hard drive 2TB each and running CentOS with ES 1.7.1

Each index has 20 shards, average shard size is : 223.1GB. and average number of documents on each shard is 977,619,827.

During the query CPU is used on 97% and disk reads are :

The query is this :

{
  "from": 0,
  "size": 50,
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "taskid": "7746bd42-c92b-4ab7-9a48-ebc399990a4c"
              }
            },
            {
              "term": {
                "userid": "..."
              }
            },
            {
              "term": {
                "account": "..."
              }
            },
            {
              "term": {
                "source": "mail"
              }
            },
            {
              "term": {
                "snapdate": 20160124142833
              }
            },
            {
              "term": {
                "folders": "02e34097b17659d74bccca3f18afa7579b8ce37b008d4809fa55fe31ffc1d65c"
              }
            }
          ],
          "_cache": true
        }
      }
    }
  },
  "explain": false,
  "fields": "*",
  "sort": [
    {
      "foldersname": {
        "order": "desc"
      }
    }
  ]
}

Is there any way to speed up queries on the dominant type (one that has 85% of documents).

Thanks

Christian_Dahlqvist · January 26, 2016, 8:42am

How many shards do you have? What is the average shard size? What does CPU and disk I/O look like when you run the long running query?

dadoonet · January 26, 2016, 8:54am

And also which kind of query are you running?

Slava_G · January 26, 2016, 8:59am

20 shards each index , I updated the question with missing information.

Thanks

Slava_G · January 26, 2016, 9:00am

Hi,
I updated the question with missing information, please see.

Thanks.

Christian_Dahlqvist · January 26, 2016, 10:20am

That is indeed very large shards and a lot of documents to read from disk and sift through per shard. We generally recommend keeping the shard size below 50GB to make recovery and transfer of shards easier, but query performance can also decrease when the shard gets too large. That is however not going to help you right away as it would require reindexing.

How many documents does one of these queries on average match? How many concurrent queries are you running? What does heap usage look like? Do you see anything in the logs, e.g. about long garbage collection? Can you also ensure you have got swap disabled?

Slava_G · January 26, 2016, 10:44am

This specific query match 11 documents, it could vary from 0 to 50 in average.
As for concurrent query - 90% of the queries are running alone , no other queries are running and sometimes 2 very rear 3-4. Swap is disabled on every node.
As for heap , it's configured 25% for index buffer, 20% for fields and 20% filter cache.
I don't see anything special in the logs.

But, other queries on other types are running pretty fast, even with this shards size, so is there any relation between type size ?

Thanks.

Topic		Replies	Views
Random very slow queries Elasticsearch	8	2498	July 6, 2017
Cluster optimization(indexing/query performace) Elasticsearch	4	312	July 6, 2017
Performance problems Elasticsearch	12	586	July 6, 2017
ES slow query problem Elasticsearch	3	364	July 6, 2017
Slow Query Performance Elasticsearch	10	757	July 6, 2017

Very slow query on index cluster with 34B documents

Related topics