Elasticsearch not scaling beyond ~400 requests per second

Hi,

We have the following query. It doesn't seem to scale for us. Using the REST API, we're only able to perform to about 400 requests/second. It's a 3 node cluster, about 200k documents, 5 shards and 5gb of memory per node on ES 1.5.2. The query matches 45 of those documents. Neither CPU nor memory show any taxing, and we're currently using cached threadpool mode (though similar results come back with fixed sizing). We are seeing linear growth in the response times, starting from single digit milliseconds to 10ths of seconds.

Now I can see some explicit bad moves in the query, especially what seems like a duplicate term filter. I would image ES optimizes that but can't be sure.

POST myindex/_search
{
  "aggregations" : {
    "filter1" : {
      "filter" : {
        "and" : {
          "filters" : [ {
            "terms" : {
              "fromId" : [ 2,31 ]
            }
          }, {
            "term" : {
              "closed" : false
            }
          }, {
            "and" : {
              "filters" : [ {
                "terms" : {
                  "toId" : [ 2,31 ]
                }
              }, {
                "terms" : {
                  "fromId" : [ 2,31 ]
                }
              } ]
            }
          } ]
        }
      },
      "aggregations" : {
        "range" : {
          "range" : {
            "field" : "dueDateNumber",
            "ranges" : [ {
              "key" : "overdue",
              "to" : 201510260000
            }, {
              "key" : "comingsoon",
              "from" : 201510260000,
              "to" : 201511240000
            }, {
              "key" : "thefuture",
              "from" : 201511240000
            } ]
          }
        }
      }
    }
  }
}
1 Like

It looks like regardless of the query used, we're maxing out at 400~ish connections.

Just wondering if any community members have any thoughts about this?

I have also ES 1.5.2 with 3 nodes in production. While using the transport client Java API I can send a mix of ~5000 term queries per second and ~15000 documents per second (bulk indexing) from a 4th server for ~3 hours (until job completes), and it's the 4th server that is limiting those numbers.

Wow, I wish I was seeing that problem :smile:
Any chance you can share some of your tuning or configuration options? What OS do you use?

The servers are HP DL165 G7, 32 cores, 64GB RAM, ~1.8TB RAID0, data dir mount with noatime

OS: RHEL 6.6

java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

java -Xms16g -Xmx16g -XX:+UseG1GC -XX:MaxGCPauseMillis=1000 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -XX:+PrintReferenceGC -XX:+ParallelRefProcEnabled ...

Current ES configuration: https://gist.github.com/jprante/08b061171f4ee1876378

1 Like

Don't forget that the weight of the query is also very important. If you
are executing long queries, perhaps with aggregations, you will obviously
see a reduced number of queries per second. The query thread pool is finite
after all.

Ivan

Yes. I find 400 aggregations per second an extremely good rate.

Yes, but even when we're not using aggregations we're seeing this issue.