Shards not distributed amongst nodes


(Yehosef) #1

We have an ES cluster - started with 3 machines and we recently added 2 more. We are processing user aggregation data. We had an index called users-v6 and when we added the new machines it distributed across them. We wanted to rerun the aggregations so we wrote to a new index - users-v7.

The interesting part is that the v7 is only running on the initial machines and is not putting shards/replicas on the the other nodes.

Any ideas why and how I can fix this?


(Jimferenczi) #2

What's the spec of the new machine ? Same as the old one ? Did you check the disk space on the new machines ?


(Yehosef) #3

The machines are supposed to be the same - though the new machines have more cores/cpus (I'm not sure if they hyperthreading, etc.. new have 4 CPUs in htop, new have 16 CPUs). The old machines have about 1TB SSD and the new about 700GB. All have 32 GB RAM, heap set to 16GB.

We're doing about 20k inserts/updates a second during this screenshot.


(Jimferenczi) #4

Can you share your settings ? Did you change the default settings of the "cluster.routing.allocation.balance.shard.*" ? You can check this page https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html if you want to understand how the allocation is done.


(Yehosef) #5

Do you mean this?

GET users-v6,users-v7/_settings
{
   "users-v6": {
      "settings": {
         "index": {
            "creation_date": "1449646264318",
            "number_of_shards": "3",
            "number_of_replicas": "1",
            "version": {
               "created": "1070199"
            },
            "uuid": "gCIErd5SQhKZSUzGAxwH6g"
         }
      }
   },
   "users-v7": {
      "settings": {
         "index": {
            "creation_date": "1450350011145",
            "number_of_shards": "3",
            "number_of_replicas": "1",
            "version": {
               "created": "1070199"
            },
            "uuid": "SwmtHce2RFCHysL2SAGTKA"
         }
      }
   }
}

I don't think I've change those setting (through I was looking into to handle migrating data off nodes that are going to be removed.

Is there some where else I should look?


(Yehosef) #6

There is also

GET _cluster/settings
{
    "persistent": {
        "indices": {
            "store": {
                "throttle": {
                    "type": "merge",
                    "max_bytes_per_sec": "100mb"
                }
            }
        }
    },
    "transient": {
        "indices": {
            "cache": {
                "filter": {
                    "size": "1gb"
                }
            },
            "recovery": {
                "concurrent_streams": "5"
            }
        }
    }
}

(system) #7