Clarifications regarding # of shards / shard replication

Ironically now, the disk usage in both the machines is the same ( ~47
G ) . Which I assume is because the replication is completed on the
second box .

Correct :slight_smile:

Your index was created with 5 primary shards, each with 1 replica, but
the replicas weren't assigned because you only had one node.

But I brought in the second box, to account for future load and not
for copying the existing index .

  • How many shards does an index have ? Api call to figure out the
    same ? ( I did not set the default. 0.20.6 elasticsearch) - so i
    assume it is 5 per documentation) . But I do want to verify it now the
    index is in live.

Various ways, but this is probably the simplest:

curl -XGET 'http://127.0.0.1:9200/_cluster/health?level=indices&pretty=1'

  • Between nodes, I do not want the shards' copy to be replicated
    again.

    In other words, if I add a new node , I am expecting the disk space
    to be halving across the old and new nodes, with the shards spread
    across, but not replicated.

If you were to add a third node, then your primary and replica shards
would be redistributed, so you'd have (5 * 2 = 10 shards) / 3 nodes = 3
or 4 shards per node.

However, if you don't want replicas (you don't care about data loss if a
machine dies), then you can just turn them off.

curl -XPUT 'http://127.0.0.1:9200/my_index/_settings?pretty=1' -d '
{
"index" : {
"number_of_replicas" : 0
}
}
'

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.