Way to limit # of documents or storage size per node

Hi Guys,

I have 2 nodes with different size of available disk, one with 500g
and another with 200g.. but on both nodes I can just use 50% for
elasticsearch..

Is there a way to limit # of documents or storage size per node?

Cheers

Hi,

let me see if I can understand your question correctly: you are saying that
in fact you have two servers and you can use up to 250g from the first and
up to 100g from the second, correct?

ES does not take account on CPU/RAM/Disk differences among cluster nodes
currently. It is highly recommended to use equal nodes for optimal
performance. However, if the disk space is your only concern then you can
run two ES nodes on the first server, this will give you three node cluster
and given that shard indices will grow more or less the same speed you can
end up having three shards each up to 100g. However, this is not optimal
because if you run more nodes on one physical server then if that server
fails you can lose also shard replicas that were allocated to the second ES
node which was running on the same physical server.

Regards,
Lukas

On Mon, Aug 15, 2011 at 9:17 PM, monit06 emerson.santos@gmail.com wrote:

Hi Guys,

I have 2 nodes with different size of available disk, one with 500g
and another with 200g.. but on both nodes I can just use 50% for
elasticsearch..

Is there a way to limit # of documents or storage size per node?

Cheers

Hi Lukas,

You are correct. My use case for ES is to load lots of log data into it as
searchable documents.. currently we generate almost 5g of log per day.. so
I'm trying to use all available disk space on our non-production servers.

I think that performance is not currently our primary concern as searching
for log data is not really done every time. So, do you recommend a different
approach for that?

Cheers,
Emerson

On Mon, Aug 15, 2011 at 10:06 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

let me see if I can understand your question correctly: you are saying that
in fact you have two servers and you can use up to 250g from the first and
up to 100g from the second, correct?

ES does not take account on CPU/RAM/Disk differences among cluster nodes
currently. It is highly recommended to use equal nodes for optimal
performance. However, if the disk space is your only concern then you can
run two ES nodes on the first server, this will give you three node cluster
and given that shard indices will grow more or less the same speed you can
end up having three shards each up to 100g. However, this is not optimal
because if you run more nodes on one physical server then if that server
fails you can lose also shard replicas that were allocated to the second ES
node which was running on the same physical server.

Regards,
Lukas

On Mon, Aug 15, 2011 at 9:17 PM, monit06 emerson.santos@gmail.com wrote:

Hi Guys,

I have 2 nodes with different size of available disk, one with 500g
and another with 200g.. but on both nodes I can just use 50% for
elasticsearch..

Is there a way to limit # of documents or storage size per node?

Cheers

Hi,

if you really want (has) to use 250g from one server and 100g from the other
then this approach is probably to only way how to go about it for now (ie
start more ES nodes on one machine). But as you can see this is not
recommended for production (actually it can work pretty well until you lose
one of those servers and of course more nodes on one machine will require
more RAM as well).

You could theoretically try to experiment with routing feature when indexing
your data but I haven't tried it yet. Basically it should allow you to
control which documents go to the same shard (so given that you want to
split indices size in 1 vs 2 ratio you can give one routing value to every
first document and second routing value to every second and third document).
But that is just an dirty idea (not sure if you can control that the
"bigger" shard should end up on the "bigger" machine in this case). Routing
value was not meant to be used as an index size control.

Regards,
Lukas

On Mon, Aug 15, 2011 at 10:18 PM, Emerson Santos
emerson.santos@gmail.comwrote:

Hi Lukas,

You are correct. My use case for ES is to load lots of log data into it as
searchable documents.. currently we generate almost 5g of log per day.. so
I'm trying to use all available disk space on our non-production servers.

I think that performance is not currently our primary concern as searching
for log data is not really done every time. So, do you recommend a different
approach for that?

Cheers,
Emerson

On Mon, Aug 15, 2011 at 10:06 PM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

Hi,

let me see if I can understand your question correctly: you are saying
that in fact you have two servers and you can use up to 250g from the first
and up to 100g from the second, correct?

ES does not take account on CPU/RAM/Disk differences among cluster nodes
currently. It is highly recommended to use equal nodes for optimal
performance. However, if the disk space is your only concern then you can
run two ES nodes on the first server, this will give you three node cluster
and given that shard indices will grow more or less the same speed you can
end up having three shards each up to 100g. However, this is not optimal
because if you run more nodes on one physical server then if that server
fails you can lose also shard replicas that were allocated to the second ES
node which was running on the same physical server.

Regards,
Lukas

On Mon, Aug 15, 2011 at 9:17 PM, monit06 emerson.santos@gmail.comwrote:

Hi Guys,

I have 2 nodes with different size of available disk, one with 500g
and another with 200g.. but on both nodes I can just use 50% for
elasticsearch..

Is there a way to limit # of documents or storage size per node?

Cheers