Okay to have really unbalanced nodes?

Hi! I just got started with Elastic search, and I've been setting up
nodes virtual machines I have using spare cpu around my existing
setup.

I have about 20 gigs of indexes with 2 copies, so it uses about 60
gigs of space total. I have 6 nodes, but each of them has very
different capacities - 2 have 8 gigs of memory and 4 vcpus, 2 have 14
gigs of memory and 4 vcpus, and 2 have 24 gigs of memory with 14
vcpus! I have the Java ES_MIN and ES_MAX memory set at half of the
physical memory for each box in all cases.

I noticed while running some tests that elasticsearch tends to put
more data on the systems that have more memory. Is this a coincidence
or is it actually looking at the amount of memory available, or the
Java VM size, in determining where to relocate data?

It is a coincidence, though a planned future feature :). Currently, ES, when doing allocation, treats all nodes as equals.
On Tuesday, January 11, 2011 at 2:39 AM, jalano wrote:

Hi! I just got started with Elastic search, and I've been setting up
nodes virtual machines I have using spare cpu around my existing
setup.

I have about 20 gigs of indexes with 2 copies, so it uses about 60
gigs of space total. I have 6 nodes, but each of them has very
different capacities - 2 have 8 gigs of memory and 4 vcpus, 2 have 14
gigs of memory and 4 vcpus, and 2 have 24 gigs of memory with 14
vcpus! I have the Java ES_MIN and ES_MAX memory set at half of the
physical memory for each box in all cases.

I noticed while running some tests that elasticsearch tends to put
more data on the systems that have more memory. Is this a coincidence
or is it actually looking at the amount of memory available, or the
Java VM size, in determining where to relocate data?

If there are more non-equal servers would it be a good strategy to take the
most weak one, setup adequate ES node for it and then divide resources of
the other machines equally? For example if you have machine with 2GB of RAM
and 8GB of RAM then how about running one node on the 2GB machine and 4
similar nodes on 8GB machine? This would not make nodes equal in terms of
used CPUs but at least in terms of used RAM. Does such deployment strategy
make sense?

Regards,

Lukas
Dne 11.1.2011 20:39 "Shay Banon" shay.banon@elasticsearch.com napsal(a):

It is a coincidence, though a planned future feature :). Currently, ES,
when doing allocation, treats all nodes as equals.
On Tuesday, January 11, 2011 at 2:39 AM, jalano wrote:

Hi! I just got started with Elastic search, and I've been setting up
nodes virtual machines I have using spare cpu around my existing
setup.

I have about 20 gigs of indexes with 2 copies, so it uses about 60
gigs of space total. I have 6 nodes, but each of them has very
different capacities - 2 have 8 gigs of memory and 4 vcpus, 2 have 14
gigs of memory and 4 vcpus, and 2 have 24 gigs of memory with 14
vcpus! I have the Java ES_MIN and ES_MAX memory set at half of the
physical memory for each box in all cases.

I noticed while running some tests that elasticsearch tends to put
more data on the systems that have more memory. Is this a coincidence
or is it actually looking at the amount of memory available, or the
Java VM size, in determining where to relocate data?

Yes, it does, you can certainly run more than one node on a machine. The only thing missing in that design is for ES to make sure that replicas are not allocated on the same machine, not node.
On Wednesday, January 12, 2011 at 1:14 PM, Lukáš Vlček wrote:

If there are more non-equal servers would it be a good strategy to take the most weak one, setup adequate ES node for it and then divide resources of the other machines equally? For example if you have machine with 2GB of RAM and 8GB of RAM then how about running one node on the 2GB machine and 4 similar nodes on 8GB machine? This would not make nodes equal in terms of used CPUs but at least in terms of used RAM. Does such deployment strategy make sense?
Regards,
Lukas
Dne 11.1.2011 20:39 "Shay Banon" shay.banon@elasticsearch.com napsal(a):

It is a coincidence, though a planned future feature :). Currently, ES, when doing allocation, treats all nodes as equals.
On Tuesday, January 11, 2011 at 2:39 AM, jalano wrote:

Hi! I just got started with Elastic search, and I've been setting up
nodes virtual machines I have using spare cpu around my existing
setup.

I have about 20 gigs of indexes with 2 copies, so it uses about 60
gigs of space total. I have 6 nodes, but each of them has very
different capacities - 2 have 8 gigs of memory and 4 vcpus, 2 have 14
gigs of memory and 4 vcpus, and 2 have 24 gigs of memory with 14
vcpus! I have the Java ES_MIN and ES_MAX memory set at half of the
physical memory for each box in all cases.

I noticed while running some tests that elasticsearch tends to put
more data on the systems that have more memory. Is this a coincidence
or is it actually looking at the amount of memory available, or the
Java VM size, in determining where to relocate data?