Why does scaling Elastic Search on 2 nodes gives no performance gain?

I've installed ES on 2 nodes, configured them to run in the same cluster but got no gain in performance (on my test cases I've actually got an average 6% increase in search duration).

I used queries that range from 10ms to 3.5s on average.
The slow queries run aggregations and are from what I see cpu bound.

I've checked everything with Marvel and the distribution of the data nodes in the cluster seem fine ( tried 1 or 0 replicas ).

I'm running ES 2.3.1 on ubuntu 14.04.

What are the possible pitfalls when running a cluster? Is this to be expected ?

Mapping

{
  "locations" : {
    "mappings" : {
      "doc_type_1" : {
        "properties" : {
          "coordsField" : {
              "type" : "geo_point",
              "lat_lon" : true
            },
          "field8" : {
            "properties" : {
              "prop1" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
              "prop2" : {
                "type" : "date",
                "format" : "yyyy-MM-dd"
              },
              "prop3" : {
                "type" : "boolean"
              },
              "prop4" : {
                "type" : "string"
              },
              "prop5" : {
                "type" : "date",
                "format" : "yyyy-MM-dd"
              },
              "prop6" : {
                "type" : "string"
              }
            }
          },
          "field7" : {
            "properties" : {
              "prop1" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
              "prop2" : {
                "type" : "boolean"
              },
              "prop3" : {
                "type" : "string"
              },
              "prop4" : {
                "type" : "string"
              }
            }
          },
          "field6" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "field5" : {
            "properties" : {
              "market" : {
                "type" : "string"
              },
              "niche_market" : {
                "type" : "string"
              },
              "sub_market" : {
                "type" : "string"
              }
            }
          },
          "field4" : {
            "type" : "boolean"
          },
          "field3" : {
            "properties" : {
              "prop1" : {
                "type" : "string"
              }
            }
          },
          "field2" : {
            "type" : "boolean"
          },
          "field1" : {
            "properties" : {
              "prop_1" : {
                "properties" : {
                  "prop_1_1" : {
                    "properties" : {
                      "prop_1_1_1" : {
                        "type" : "double"
                      }
                    }
                  },
                  "prop_1_2" : {
                    "properties" : {
                      "prop_1_2_1" : {
                        "type" : "string"
                      },
                      "prop_1_2_2" : {
                        "type" : "string"
                      }
                    }
                  }
                }
              }
            }
          },
          // ... 1000 more lines of mappings
        }
      }
    }
  }
}

Node stats

Memory: 2GB / 5GB
Documents: 22,996,755
Data: 13GB
Version: 2.3.1

Index settings

8 shards + replica
Documents: 18.8m
Data: 11.5GB

There's not really enough data to go on here.

What do your documents look like, queries, mappings, node size, etc etc

My laziness was hoping those details wouldn't be needed ..
Data updated now.

Added the query here as I could not add all the query in the initial post due to size restrictions.

Query

{
   "fields": ["id", "coordsField"],
  "query": {
       "filtered": {
           "query": {
               "bool": {
                   "must": [],
                   "must_not": [],
                   "should": [],
                   "filter": [{
                       "query_string": {
                           "default_field": "type",
                           "query": "type1 type2 type3 type4 type5"
                       }
                   }, {
                       "or": [{
                           "exists": {
                               "field": "field0"
                           }
                       },
...
                       {
                           "exists": {
                               "field": "field6"
                           }
                       }]
                   }, {
                       "geo_bounding_box": {
                           "type": "indexed",
                           "coordsField": {
                               "top_left": {
                                   "lat": 123.471723,
                                   "lon": -123.173828
                               },
                               "bottom_right": {
                                   "lat": 123.937079,
                                   "lon": 123.82373
                               }
                           }
                       }
                   }, {
                       "exists": {
                           "field": "field7"
                       }
                   }, {
                       "exists": {
                           "field": "field8.prop"
                       }
                   }],
                   "minimum_should_match": 1
               }
           },
           "filter": {
               "geo_bounding_box": {
                   "coords.current.geometry.coordinates": {
                       "bottom_left": [-0.13623046875, 51.50600814450517],
                       "top_right": [-0.08349609375, 51.53881991608289]
                   },
                   "type": "indexed"
               }
           }
       }
   },
   "size": 0,
   "aggregations": {
       "zoom1": {
           "geohash_grid": {
               "field": "coordsField",
               "size": 5000,
               "precision": 7
           },
           "aggs": {
               "geohash": {
                   "top_hits": {
                       "sort": {
                           "id": {
                               "order": "desc",
                               "ignore_unmapped": true
                           }
                       },
                       "_source": false,
                       "fielddata_fields": ["id", "coordsField"],
                       "size": 1
                   }
               }
           }
       }
   }
}

Duration 1 node : 142 ms
Duration 2 nodes : 146 ms

Any ideas on where to look or what could be the possible bottlenecks ?

  • configuration
  • query
  • network (I've tried one 2 node cluster on DigitalOcean and one on our LAN)
  • cpu (shouldn't 2 cpu's > 1)
  • ram (shouldn't 10gb ram > 5)
  • iops (shouldn't 2 ssds > 1)
  • ES internals overhead when in cluster mode

The query latency is at least partially driven by the shard size. Although the query is distributed across all shards and executed in parallel, the processing against a single shard is single threaded per query. Query latency may be limited by the shard size and how long it takes to execute the against each shard. Adding resources may therefore not necessarily reduce the latency, but the added resources should allow you to have more queries running in parallel without degrading performance.

I've also tested running more queries in parallel (10 threads) and didn't get to much gain there either.

Also I'm using 8 shards on the tested index (and 16 in total when the replication node is online) should I increase the nr of shards ?

I've looked at the load on both systems and it seems to be lower on the 2 nodes vs when running with just one node but I'm still curious how ES does things when multiple nodes are involved, what are the steps vs single node operation.