Shadow replicas indices

"If you would like to use a shared filesystem, you can use the shadow replicas settings to choose where on disk the data for an index should be kept, as well as how Elasticsearch should replay operations on all the replica shards of an index."

hi,
if I want to store one index in the specific datapath but not the default path , the shadow replicas indices may be the only solution?

Hi,

Shadow replica allows this kind of configuration but be ware that this is an expert feature. Shadow replica shards are seen like real normal replica shards but under the hood there are some operation they just don't do, like replicating a document index operation, and they rely on the shared filesystem to sync shards files.

Why would you like to store an index on a specific path? Note that you can allocate specific indices on nodes, and then have nodes configured to use different data paths locally.

hi,
1.In fact, I want to store them in different disks. Important indices may be stored in the SSD, while the other may be stored in the HDD. So I want to specify the path in SSD when the index is created.
2.If when I used data.shared_path in th configure yml
and don't set the settings of the index : "shadow_replicas": true

may be this

    {
        "index" : {
            "number_of_shards" : 1,
            "number_of_replicas" : 4,
            "data_path": "/opt/data/my_index"
        }
    }

by setting this ,can i avoid using shadow replicas and using a specific path at the same time?

Sorry ,but I really don't know how to allocate specific indices on nodes

Elasticsearch does not allow you to configure different data paths for indices. Only the shadow replica allow this, but it mean that you must use a shared filesystem with all your nodes accessing the same SSD disk.

I think you should allocate your important indices to the nodes that have SSDs and allocate your less important indices to nodes that have spinning disks for example.

You can do this using the Shard Allocation Filtering feature: https://www.elastic.co/guide/en/elasticsearch/reference/2.3/shard-allocation-filtering.html

1 Like

Thank you, I got it. I just want to reach a compromise.
In fact, I have tried to set this in 3 nodes.
node0:

    path.data: /opt/elastic/..
    path.shared_data: /disk1/..

node1:

    path.data: /opt/elastic/..
    path.shared_data: /disk2/..

node2:

    path.data: /opt/elastic/..
    path.shared_data: /disk1/...

Then I create a new index

 POST /test
 {
        "index" : {
            "number_of_shards" : 1,
            "number_of_replicas" : 4,
            "data_path": "/1"
        }
    }

not set the shadow replicas indices :True. So it's default to be false. I think the filesystem may be used while the shadow replicas indices may not.

Then I find the data of the index allocate in the "path.shared_data" of these 3 nodes, just like a seperating data-stored-system comparing to the defalut path system.
I just wonder whether the replicas of the index "test " will be different from other indices stored in the "data.path".

"Shard Allocation Filtering" may not help me customize the datapath of a single nodes but the shards

OK, sorry, it looks like I'm wrong - one can set a custom data_path for indices without using shadow replicas.

According to https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-shadow-replicas.html:

index.data_path (string)

Path to use for the index’s data. Note that by default Elasticsearch will
append the node ordinal by default to the path to ensure multiple instances
of Elasticsearch on the same machine do not share a data directory.

So in your case the "test" index will use a custom data path on the shared data path; each node will use the same shared filesystem but wil prefix the path using the node ordinal.

Yes