# of shards vs. open files


(Javier Muniz) #1

Does the # of shards a node has impact the # of open files required? I am running a node that has more than 3000 shards because I have broken the data into a single-index-per-customer layout and I am finding myself running into the "too many open files" problem more and more. I just recently had to bump open files past 32000 (verified via -Des.max-open-files=true) in order to continue to add more customers to the node.

I guess my question is should I put these customers all into a single index to reduce the total # of shards, reduce the # of shards per index, or is my problem completely unrelated to sharding?

-javier


(Shay Banon) #2

Yes, each shard is a Lucene index, which requires its share of open files
handles (and memory requirements and so on). You can go with a single index,
and route based on user (its simpler to do that with 0.17, since you can
associate an alias with the username, and an alias can have a filter (to
filter results only for the relevant user), and a routing value (probably
the username).

On Mon, Aug 8, 2011 at 9:52 PM, Javier Muniz javier@granicus.com wrote:

Does the # of shards a node has impact the # of open files required? I
am running a node that has more than 3000 shards because I have broken the
data into a single-index-per-customer layout and I am finding myself running
into the "too many open files" problem more and more. I just recently had
to bump open files past 32000 (verified via -Des.max-open-files=true) in
order to continue to add more customers to the node.

I guess my question is should I put these customers all into a single index
to reduce the total # of shards, reduce the # of shards per index, or is my
problem completely unrelated to sharding?

-javier


(Javier Muniz) #3

Hmm. I think the problem I'll have then is that each customer has their own set of ids so the document's id won't necessarily be unique. Placing each customer in their own index was a simple way of dealing with this. Is there any out of the box way to address this as well, or should I just start using a compound key?

-javier


From: elasticsearch@googlegroups.com [elasticsearch@googlegroups.com] on behalf of Shay Banon [kimchy@gmail.com]
Sent: Monday, August 08, 2011 12:09 PM
To: elasticsearch@googlegroups.com
Subject: Re: # of shards vs. open files

Yes, each shard is a Lucene index, which requires its share of open files handles (and memory requirements and so on). You can go with a single index, and route based on user (its simpler to do that with 0.17, since you can associate an alias with the username, and an alias can have a filter (to filter results only for the relevant user), and a routing value (probably the username).

On Mon, Aug 8, 2011 at 9:52 PM, Javier Muniz <javier@granicus.commailto:javier@granicus.com> wrote:
Does the # of shards a node has impact the # of open files required? I am running a node that has more than 3000 shards because I have broken the data into a single-index-per-customer layout and I am finding myself running into the "too many open files" problem more and more. I just recently had to bump open files past 32000 (verified via -Des.max-open-files=true) in order to continue to add more customers to the node.

I guess my question is should I put these customers all into a single index to reduce the total # of shards, reduce the # of shards per index, or is my problem completely unrelated to sharding?

-javier


(Shay Banon) #4

There are ways to reduce the number of open files for each Lucene index. One
option is to use the compound file format. What it does is basically takes
most of the files of a segment in an index, and compound them into a single
file. This is an expensive IO wise operation, but it really depends on the
use case if you can take it or not. Another option is to reduce the number
of segments by tweaking the merge policy:
http://www.elasticsearch.org/guide/reference/index-modules/merge.html (you
can tell what segments form an index using the segments API:
http://www.elasticsearch.org/guide/reference/api/admin-indices-segments.html
).

Of course, another option that you have is to start more nodes, and have the
shards allocated on them to reduce the number of shards allocated per node.

On Tue, Aug 9, 2011 at 2:27 AM, Javier Muniz javier@granicus.com wrote:

Hmm. I think the problem I'll have then is that each customer has their
own set of ids so the document's id won't necessarily be unique. Placing
each customer in their own index was a simple way of dealing with this. Is
there any out of the box way to address this as well, or should I just start
using a compound key?

-javier


From: elasticsearch@googlegroups.com [elasticsearch@googlegroups.com] on
behalf of Shay Banon [kimchy@gmail.com]
Sent: Monday, August 08, 2011 12:09 PM
To: elasticsearch@googlegroups.com
Subject: Re: # of shards vs. open files

Yes, each shard is a Lucene index, which requires its share of open
files handles (and memory requirements and so on). You can go with a single
index, and route based on user (its simpler to do that with 0.17, since you
can associate an alias with the username, and an alias can have a filter (to
filter results only for the relevant user), and a routing value (probably
the username).

On Mon, Aug 8, 2011 at 9:52 PM, Javier Muniz javier@granicus.com wrote:

Does the # of shards a node has impact the # of open files required? I
am running a node that has more than 3000 shards because I have broken the
data into a single-index-per-customer layout and I am finding myself running
into the "too many open files" problem more and more. I just recently had
to bump open files past 32000 (verified via -Des.max-open-files=true) in
order to continue to add more customers to the node.

I guess my question is should I put these customers all into a single
index to reduce the total # of shards, reduce the # of shards per index, or
is my problem completely unrelated to sharding?

-javier


(system) #5