Sorting performance trouble

Hi,

Recently we moved to a new production server and upgraded from ES 1.4.1 to 1.7.1 in the process.

Since the upgrade, sorting performance on an entire collection (comments) to return the top 5 newest documents with 4 million documents has decreased a good deal from 75ms average to 175ms average, as well as a 3x slowdown on queries on the images index (~1 million documents) from 50ms to 150ms.

To attempt to determine if it was a regression in 1.7.1, I cloned the comments index to a local instance of ES 1.7.1 and attempted to sort using the same query and config file as production; performance was around 70ms for matching/sorting all documents and returning the top 5.

Almost the only differences in the config file are naming (ElasticSearch => Elasticsearch) and indentation (# commented thing => #commented thing); the two seem functionally identical.

Any ideas of suggestions on what to poke on production would be very helpful.

FWIW, the query is here:

{
  "index": "comments",
  "type": "comment",
  "body": {
    "query": {
      "filtered": {
        "query": {"match_all": {}},
        "filter": {
          "term": {"hidden_from_users": "false", "_cache": "true" }
        }
      }
    },
    "sort": [{"posted_at": "desc"}],
    "size": 5,
    "from": 0
  }
}

Hi Liam,
Everything else is the same between your prod env and your local instance?
( io subsystem, CPU, ram, #queries...)

CPU and RAM are different:
Local is an Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz with 24GB of RAM
Production is a Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz with 128GB of RAM

Both have more than enough memory to fit the entire comments index (645mb, no _source or _all) into memory.

I didn't measure by number of queries, simply average execution time. While I didn't test that this is the case, it is possible that on production Elastic is busy doing other things when receiving the comments load+sort query. I suppose I can bring a short site downtime to test this, if you think that might be the issue.

I brought the site down temporarily to test whether the issue was concurrent requests; once the site was off, query performance did not change.

How much memory are you dedicating to ES?

On production, 31.5 gigabytes.

It's only using 18.

Compare the node stats (/_nodes/stats?pretty) on both environments to see if something stands out.

Here are the node stats on the old (idle) production server and the new (busy) production server, as well as a diff between them:

Maybe try bringing that down to 30.5GB. I think that is recommended as the maximum in order to avoid 64 bit addresses.