Clarifications regarding # of shards / shard replication

I was running an elasticsearch cluster of 1 ( ?? ) that was running out of
disk space.

node.master: true
node.data: true

Only 1 node.

Added another one, to the cluster

node.master: false
node.data: true

Also, disabled multi-cast discovery and set unicast discovery

discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.138.150.61"]

The discovery happened ( after port 9300 is also opened for data
communication as well).

Ironically now, the disk usage in both the machines is the same ( ~47 G ) .
Which I assume is because the replication is completed on the second box .
But I brought in the second box, to account for future load and not for
copying the existing index .

Now, how would do I

  • How many shards does an index have ? Api call to figure out the same ? (
    I did not set the default. 0.20.6 elasticsearch) - so i assume it is 5 per
    documentation) . But I do want to verify it now the index is in live.

  • Between nodes, I do not want the shards' copy to be replicated again.

    In other words, if I add a new node , I am expecting the disk space to
    be halving across the old and new nodes, with the shards spread across, but
    not replicated.

How would I achieve the same ? ( Which es.yml property keys should I use ?
) .

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ironically now, the disk usage in both the machines is the same ( ~47
G ) . Which I assume is because the replication is completed on the
second box .

Correct :slight_smile:

Your index was created with 5 primary shards, each with 1 replica, but
the replicas weren't assigned because you only had one node.

But I brought in the second box, to account for future load and not
for copying the existing index .

  • How many shards does an index have ? Api call to figure out the
    same ? ( I did not set the default. 0.20.6 elasticsearch) - so i
    assume it is 5 per documentation) . But I do want to verify it now the
    index is in live.

Various ways, but this is probably the simplest:

curl -XGET 'http://127.0.0.1:9200/_cluster/health?level=indices&pretty=1'

  • Between nodes, I do not want the shards' copy to be replicated
    again.

    In other words, if I add a new node , I am expecting the disk space
    to be halving across the old and new nodes, with the shards spread
    across, but not replicated.

If you were to add a third node, then your primary and replica shards
would be redistributed, so you'd have (5 * 2 = 10 shards) / 3 nodes = 3
or 4 shards per node.

However, if you don't want replicas (you don't care about data loss if a
machine dies), then you can just turn them off.

curl -XPUT 'http://127.0.0.1:9200/my_index/_settings?pretty=1' -d '
{
"index" : {
"number_of_replicas" : 0
}
}
'

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.