The only way I can think of is to create the index, then shut down
Elasticsearch, then copy each shard's directory (eg:
/var/lib/elasticsearch/elasticsearch/nodes/0/indices/test/$SHARD_NUMBER/)
to each of your disks, then mount each disk on each shard directory. On
restarting Elasticsearch, it should use the 4 disks.
I've never tried it, I have no idea if it works but I guess it's worth
trying
Looking deeper at what Simon suggested, that's how you can have multiple
data directories as mentioned herehttp://www.elasticsearch.org/guide/reference/setup/dir-layout/.
The files are going to get distributed into those different directories
depending on the index.store.distributor setting which can be set to
least_used (selects the directory with the most available space) or random.
But the distribution happens per file, not per shard (lucene index), thus
it is not possible to control where every (whole) shard is stored.
It would be interesting to know more about the usecase here. Could you
elaborate a bit more about it? And on question: do you want to just put
every shard in a different directory or would you like to control where
each shard goes too?
On Tuesday, August 20, 2013 5:35:15 PM UTC+2, simonw wrote:
you can specify multiple data directories like path.data=["path1", "path2"]
this will put shards on different disks if you configure it to point to
them.. it will use the least used one
In this particular case, I intend to use ES as kinda of data archive for a
few TB of text info, and I want to do a query with keywords from time to
time to get related info. I intend to use 8 shards. In the server I set
up (amazon ec2), I have 8 cpu cores, and 8 1TB disk. I'd like to put one
shard on each disk.
From what I learned about ES, seems like it can distribute storage across
multiple disk, like raid0. But it can not control which shard goes to
which place. As a result, data for each shard would spread across all
eight disks. True?
Looking deeper at what Simon suggested, that's how you can have multiple
data directories as mentioned herehttp://www.elasticsearch.org/guide/reference/setup/dir-layout/.
The files are going to get distributed into those different directories
depending on the index.store.distributor setting which can be set to
least_used (selects the directory with the most available space) or random.
But the distribution happens per file, not per shard (lucene index), thus
it is not possible to control where every (whole) shard is stored.
It would be interesting to know more about the usecase here. Could you
elaborate a bit more about it? And on question: do you want to just put
every shard in a different directory or would you like to control where
each shard goes too?
On Tuesday, August 20, 2013 5:35:15 PM UTC+2, simonw wrote:
you can specify multiple data directories like path.data=["path1", "path2"]
this will put shards on different disks if you configure it to point to
them.. it will use the least used one
simon
On Tuesday, August 20, 2013 2:16:38 AM UTC+2, mfy...@wisewindow.comwrote:
I am trying to set up a ES server (one node only) with 4 shards. Is it
possible to put 4 shard on 4 different disks? How?
In this particular case, I intend to use ES as kinda of data archive for a
few TB of text info, and I want to do a query with keywords from time to
time to get related info. I intend to use 8 shards. In the server I set
up (amazon ec2), I have 8 cpu cores, and 8 1TB disk. I'd like to put one
shard on each disk.
From what I learned about ES, seems like it can distribute storage across
multiple disk, like raid0. But it can not control which shard goes to
which place. As a result, data for each shard would spread across all
eight disks. True?
Looking deeper at what Simon suggested, that's how you can have multiple
data directories as mentioned herehttp://www.elasticsearch.org/guide/reference/setup/dir-layout/.
The files are going to get distributed into those different directories
depending on the index.store.distributor setting which can be set to
least_used (selects the directory with the most available space) or random.
But the distribution happens per file, not per shard (lucene index), thus
it is not possible to control where every (whole) shard is stored.
It would be interesting to know more about the usecase here. Could you
elaborate a bit more about it? And on question: do you want to just put
every shard in a different directory or would you like to control where
each shard goes too?
On Tuesday, August 20, 2013 5:35:15 PM UTC+2, simonw wrote:
you can specify multiple data directories like path.data=["path1", "path2"]
this will put shards on different disks if you configure it to point to
them.. it will use the least used one
simon
On Tuesday, August 20, 2013 2:16:38 AM UTC+2, mfy...@wisewindow.comwrote:
I am trying to set up a ES server (one node only) with 4 shards. Is it
possible to put 4 shard on 4 different disks? How?
Instead of fiddling with shards and paths and reallocation etc. it is much
easier to use mdadm to create a single RAID0 on EC2 and use Elasticsearch
with the default settings. As a bonus with 8 disks, you can read 8x faster.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.