Shards represents a subpart of an index. It's a group of documents that goes to a shard.
A shard lives anywhere in the cluster and is allocated on a node.
An index is a group of shards. As a developer you index a document to an index but the document is routed to a shard where it gets stored and indexed.
You have primary shards and you may have replica shards. Replicas are perfect copies of primary shards.
A node is a JVM instance running elasticsearch.
It can host multiple shards, coming from multiple indices.
Thank for replying @A_B and @dadoonet. I don't have any indices yet. I am just studying. At my studing I figured out that the amount of shards I will set must not be higher than the total of nodes I have, otherwise I would be wasting resources. Is that correct?
It will depend on the use case. For some use cases, e.g. search use cases and high query throughput and a small data set, a single shard per node may be optimal, but most use cases have more than that. This talk may give you a better idea about how to determine the ideal number of shards for your use case.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.