Primary balancing


(jjasinek) #1

shay,

I know that shards are usually balanced amongst the nodes in an
elasticsearch cluster, but is there a way to balance the primaries.
We have a three node cluster with 3 shards and 1 replica. We have
observed recently that all three of the primary shards are assigned to
one node. Based on the fact that indexing is routed to the primary,
this could mean that one machine is dedicatedbto indexing. Is there a
way to balance the primaries against nodes, or is it best to increase
shards to 6? I'm afraid if I do that, there is still no guarantee
that each node will have close to two primaries on it.

Jason


(Shay Banon) #2

Even if you have 3 primaries allocated on a single node, with replication,
other nodes will also be busy indexing. In general, there isn't a lot of
difference between a primary and a replica, so there isn't an effort to
balance primaries (or force it). There are some cases where primaries might
work a bit more, for example, when doing searches / get and explicitly
asking them to be executed on the primary shard, but thats not the common
case.

On Fri, Oct 7, 2011 at 4:24 AM, jjasinek jjasinek@gmail.com wrote:

shay,

I know that shards are usually balanced amongst the nodes in an
elasticsearch cluster, but is there a way to balance the primaries.
We have a three node cluster with 3 shards and 1 replica. We have
observed recently that all three of the primary shards are assigned to
one node. Based on the fact that indexing is routed to the primary,
this could mean that one machine is dedicatedbto indexing. Is there a
way to balance the primaries against nodes, or is it best to increase
shards to 6? I'm afraid if I do that, there is still no guarantee
that each node will have close to two primaries on it.

Jason


(Gustavo Maia) #3

for better peformance is better I use 4 small instance or use a large in the amazon cloud?
Small instance is 32 bits with one hard drive instance, and large have 2 hard drive instance and is 64 bits.


(Shay Banon) #4

Large instances are preferable, but, do you mean 1 large instance Vs. 4
small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo gustavobbmaia@gmail.com wrote:

for better peformance is better I use 4 small instance or use a large in
the amazon cloud?
Small instance is 32 bits with one hard drive instance, and large have 2
hard drive instance and is 64 bits.


(Gustavo Maia) #5

First thank you for your attention.

In the amazon price of 4 small is the same price of one large.
The small is 32 bit and have only one hard drive. The large is 64 bits
and have two hard drive.

Today I have 300GB of index which is distributed in three machines
that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt whether
this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15 small
instance instance. I need to search back in less than 200ms on
average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banon kimchy@gmail.com

Large instances are preferable, but, do you mean 1 large instance Vs. 4 small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo gustavobbmaia@gmail.com wrote:

for better peformance is better I use 4 small instance or use a large in the amazon cloud?
Small instance is 32 bits with one hard drive instance, and large have 2 hard drive instance and is 64 bits.

--
Gustavo Maia


(Gustavo Maia) #6

correct:
My question would be to build a cluster of 40 SMALL or 15 LARGE
instance instance. I need to search back in less than 200ms on
average.

2011/10/12 Gustavo Maia gustavobbmaia@gmail.com:

First thank you for your attention.

In the amazon price of 4 small is the same price of one large.
The small is 32 bit and have only one hard drive. The large is 64 bits
and have two hard drive.

Today I have 300GB of index which is distributed in three machines
that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt whether
this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15 small
instance instance. I need to search back in less than 200ms on
average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banon kimchy@gmail.com

Large instances are preferable, but, do you mean 1 large instance Vs. 4 small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo gustavobbmaia@gmail.com wrote:

for better peformance is better I use 4 small instance or use a large in the amazon cloud?
Small instance is 32 bits with one hard drive instance, and large have 2 hard drive instance and is 64 bits.

--
Gustavo Maia

--
Gustavo Maia


(Pavel Penchev) #7

Hi,

we have similar requirements and we decided to go for the large
instances. The search times were ok on the small instances (90% below
200ms) but the indexing suffered significantly (only 30% below 200ms, we
have requirements for indexing as well). In comparison the large
instances handle both search and indexing with 95% below 200ms.

Bear in mind this is specific to the type of documents you have and the
searches you perform. Go for a 24h test I'd suggest.

Regards,
Pavel

On 13.10.2011 01:26, Gustavo Maia wrote:

correct:
My question would be to build a cluster of 40 SMALL or 15 LARGE
instance instance. I need to search back in less than 200ms on
average.

2011/10/12 Gustavo Maiagustavobbmaia@gmail.com:

First thank you for your attention.

In the amazon price of 4 small is the same price of one large.
The small is 32 bit and have only one hard drive. The large is 64 bits
and have two hard drive.

Today I have 300GB of index which is distributed in three machines
that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt whether
this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15 small
instance instance. I need to search back in less than 200ms on
average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banonkimchy@gmail.com

Large instances are preferable, but, do you mean 1 large instance Vs. 4 small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavogustavobbmaia@gmail.com wrote:

for better peformance is better I use 4 small instance or use a large in the amazon cloud?
Small instance is 32 bits with one hard drive instance, and large have 2 hard drive instance and is 64 bits.

--
Gustavo Maia


(Gustavo Maia) #8

Hi Pavel,
Thanks for all.

How many large instance do you have ?

2011/10/13 Pavel Penchev pavel.penchev@gmail.com:

Hi,

we have similar requirements and we decided to go for the large instances.
The search times were ok on the small instances (90% below 200ms) but the
indexing suffered significantly (only 30% below 200ms, we have requirements
for indexing as well). In comparison the large instances handle both search
and indexing with 95% below 200ms.

Bear in mind this is specific to the type of documents you have and the
searches you perform. Go for a 24h test I'd suggest.

Regards,
Pavel

On 13.10.2011 01:26, Gustavo Maia wrote:

correct:
My question would be to build a cluster of 40 SMALL or 15 LARGE
instance instance. I need to search back in less than 200ms on
average.

2011/10/12 Gustavo Maia gustavobbmaia@gmail.com:

First thank you for your attention.

In the amazon price of 4 small is the same price of one large.
The small is 32 bit and have only one hard drive. The large is 64 bits
and have two hard drive.

Today I have 300GB of index which is distributed in three machines
that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt whether
this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15 small
instance instance. I need to search back in less than 200ms on
average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banon kimchy@gmail.com

Large instances are preferable, but, do you mean 1 large instance Vs. 4
small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo gustavobbmaia@gmail.com wrote:

for better peformance is better I use 4 small instance or use a large in
the amazon cloud?
Small instance is 32 bits with one hard drive instance, and large have 2
hard drive instance and is 64 bits.

--
Gustavo Maia

--
Gustavo Maia


(Shay Banon) #9

In general, I suggest using the xlarge instances in Amazon, simply because
of the higher IO they provide and better performance consistency (at least
based on what users have seen).

On Thu, Oct 13, 2011 at 3:09 PM, Gustavo Maia gustavobbmaia@gmail.comwrote:

Hi Pavel,
Thanks for all.

How many large instance do you have ?

2011/10/13 Pavel Penchev pavel.penchev@gmail.com:

Hi,

we have similar requirements and we decided to go for the large
instances.
The search times were ok on the small instances (90% below 200ms) but the
indexing suffered significantly (only 30% below 200ms, we have
requirements
for indexing as well). In comparison the large instances handle both
search
and indexing with 95% below 200ms.

Bear in mind this is specific to the type of documents you have and the
searches you perform. Go for a 24h test I'd suggest.

Regards,
Pavel

On 13.10.2011 01:26, Gustavo Maia wrote:

correct:
My question would be to build a cluster of 40 SMALL or 15 LARGE
instance instance. I need to search back in less than 200ms on
average.

2011/10/12 Gustavo Maia gustavobbmaia@gmail.com:

First thank you for your attention.

In the amazon price of 4 small is the same price of one large.
The small is 32 bit and have only one hard drive. The large is 64 bits
and have two hard drive.

Today I have 300GB of index which is distributed in three machines
that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt whether
this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15 small
instance instance. I need to search back in less than 200ms on
average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banon kimchy@gmail.com

Large instances are preferable, but, do you mean 1 large instance Vs. 4
small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo gustavobbmaia@gmail.com
wrote:

for better peformance is better I use 4 small instance or use a large in
the amazon cloud?
Small instance is 32 bits with one hard drive instance, and large have 2
hard drive instance and is 64 bits.

--
Gustavo Maia

--
Gustavo Maia


(Gustavo Maia) #10

Thank you.

I'm thinking of using 10 instances c1.xlarge. Each instance
(c1.xlarge) has 4 hds.

So, I would use the appropriate version of the v0.18 elasticsearch,
which allows configure 4 hds in the same elasticsearch process. Is
that best ?

Would have any peformance problem, if I have each shard of size 15GB,
thinking that in each instance would have 4 shards, one per HD?

2011/10/14 Shay Banon kimchy@gmail.com:

In general, I suggest using the xlarge instances in Amazon, simply because
of the higher IO they provide and better performance consistency (at least
based on what users have seen).

On Thu, Oct 13, 2011 at 3:09 PM, Gustavo Maia gustavobbmaia@gmail.com
wrote:

Hi Pavel,
Thanks for all.

How many large instance do you have ?

2011/10/13 Pavel Penchev pavel.penchev@gmail.com:

Hi,

we have similar requirements and we decided to go for the large
instances.
The search times were ok on the small instances (90% below 200ms) but
the
indexing suffered significantly (only 30% below 200ms, we have
requirements
for indexing as well). In comparison the large instances handle both
search
and indexing with 95% below 200ms.

Bear in mind this is specific to the type of documents you have and the
searches you perform. Go for a 24h test I'd suggest.

Regards,
Pavel

On 13.10.2011 01:26, Gustavo Maia wrote:

correct:
My question would be to build a cluster of 40 SMALL or 15 LARGE
instance instance. I need to search back in less than 200ms on
average.

2011/10/12 Gustavo Maia gustavobbmaia@gmail.com:

First thank you for your attention.

In the amazon price of 4 small is the same price of one large.
The small is 32 bit and have only one hard drive. The large is 64 bits
and have two hard drive.

Today I have 300GB of index which is distributed in three machines
that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt whether
this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15 small
instance instance. I need to search back in less than 200ms on
average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banon kimchy@gmail.com

Large instances are preferable, but, do you mean 1 large instance Vs. 4
small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo gustavobbmaia@gmail.com
wrote:

for better peformance is better I use 4 small instance or use a large
in
the amazon cloud?
Small instance is 32 bits with one hard drive instance, and large have 2
hard drive instance and is 64 bits.

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia


(Shay Banon) #11

Heya,

Its tricky to choose between c1.xlarge (more CPU) and m1.xlarge (more
memory). I suggest going with the m1.xlarge as more memory tend
to outweigh faster CPU.

Regarding the drives. the new option to specify multiple data locations
does not depend on the number of shards. In other words, even a singel shard
allocated on a node will make use of all the data locations.

-shay.banon

On Mon, Oct 17, 2011 at 7:57 PM, Gustavo Maia gustavobbmaia@gmail.comwrote:

Thank you.

I'm thinking of using 10 instances c1.xlarge. Each instance
(c1.xlarge) has 4 hds.

So, I would use the appropriate version of the v0.18 elasticsearch,
which allows configure 4 hds in the same elasticsearch process. Is
that best ?

Would have any peformance problem, if I have each shard of size 15GB,
thinking that in each instance would have 4 shards, one per HD?

2011/10/14 Shay Banon kimchy@gmail.com:

In general, I suggest using the xlarge instances in Amazon, simply
because
of the higher IO they provide and better performance consistency (at
least
based on what users have seen).

On Thu, Oct 13, 2011 at 3:09 PM, Gustavo Maia gustavobbmaia@gmail.com
wrote:

Hi Pavel,
Thanks for all.

How many large instance do you have ?

2011/10/13 Pavel Penchev pavel.penchev@gmail.com:

Hi,

we have similar requirements and we decided to go for the large
instances.
The search times were ok on the small instances (90% below 200ms) but
the
indexing suffered significantly (only 30% below 200ms, we have
requirements
for indexing as well). In comparison the large instances handle both
search
and indexing with 95% below 200ms.

Bear in mind this is specific to the type of documents you have and
the

searches you perform. Go for a 24h test I'd suggest.

Regards,
Pavel

On 13.10.2011 01:26, Gustavo Maia wrote:

correct:
My question would be to build a cluster of 40 SMALL or 15 LARGE
instance instance. I need to search back in less than 200ms on
average.

2011/10/12 Gustavo Maia gustavobbmaia@gmail.com:

First thank you for your attention.

In the amazon price of 4 small is the same price of one large.
The small is 32 bit and have only one hard drive. The large is 64 bits
and have two hard drive.

Today I have 300GB of index which is distributed in three machines
that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt whether
this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15 small
instance instance. I need to search back in less than 200ms on
average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banon kimchy@gmail.com

Large instances are preferable, but, do you mean 1 large instance Vs.
4

small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo gustavobbmaia@gmail.com
wrote:

for better peformance is better I use 4 small instance or use a large
in
the amazon cloud?
Small instance is 32 bits with one hard drive instance, and large have
2

hard drive instance and is 64 bits.

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia


(Gustavo Maia) #12

So, I'd better install the machine (m1.xlarge) 4 ES, an ES for each point
data to different HD. It should be better because at the time of the search
are going to be in parallel searches using 4 hds, with different processors.
I set up each instance of ES with 3GB of ram.

If I use the machine (m1.large) I would install only 2 ES, one for each
allocate the same HD and 3GB of ram for each ES.

is it?

****** m1.xlarge Config

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge

****** m1.large Config:

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

//###################################################################

2011/10/17 Shay Banon kimchy@gmail.com:

Heya,
Its tricky to choose between c1.xlarge (more CPU) and m1.xlarge (more
memory). I suggest going with the m1.xlarge as more memory tend
to outweigh faster CPU.
Regarding the drives. the new option to specify multiple data locations
does not depend on the number of shards. In other words, even a singel
shard
allocated on a node will make use of all the data locations.
-shay.banon

On Mon, Oct 17, 2011 at 7:57 PM, Gustavo Maia gustavobbmaia@gmail.com
wrote:

Thank you.

I'm thinking of using 10 instances c1.xlarge. Each instance
(c1.xlarge) has 4 hds.

So, I would use the appropriate version of the v0.18 elasticsearch,
which allows configure 4 hds in the same elasticsearch process. Is
that best ?

Would have any peformance problem, if I have each shard of size 15GB,
thinking that in each instance would have 4 shards, one per HD?

2011/10/14 Shay Banon kimchy@gmail.com:

In general, I suggest using the xlarge instances in Amazon, simply
because
of the higher IO they provide and better performance consistency (at
least
based on what users have seen).

On Thu, Oct 13, 2011 at 3:09 PM, Gustavo Maia gustavobbmaia@gmail.com
wrote:

Hi Pavel,
Thanks for all.

How many large instance do you have ?

2011/10/13 Pavel Penchev pavel.penchev@gmail.com:

Hi,

we have similar requirements and we decided to go for the large
instances.
The search times were ok on the small instances (90% below 200ms)
but

the
indexing suffered significantly (only 30% below 200ms, we have
requirements
for indexing as well). In comparison the large instances handle both
search
and indexing with 95% below 200ms.

Bear in mind this is specific to the type of documents you have and
the
searches you perform. Go for a 24h test I'd suggest.

Regards,
Pavel

On 13.10.2011 01:26, Gustavo Maia wrote:

correct:
My question would be to build a cluster of 40 SMALL or 15 LARGE
instance instance. I need to search back in less than 200ms on
average.

2011/10/12 Gustavo Maia gustavobbmaia@gmail.com:

First thank you for your attention.

In the amazon price of 4 small is the same price of one large.
The small is 32 bit and have only one hard drive. The large is 64
bits
and have two hard drive.

Today I have 300GB of index which is distributed in three machines
that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt
whether

this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15 small
instance instance. I need to search back in less than 200ms on
average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banon kimchy@gmail.com

Large instances are preferable, but, do you mean 1 large instance
Vs.

4
small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo gustavobbmaia@gmail.com
wrote:

for better peformance is better I use 4 small instance or use a
large

in
the amazon cloud?
Small instance is 32 bits with one hard drive instance, and large
have 2
hard drive instance and is 64 bits.

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia


(Shay Banon) #13

You mean start 3 ES processes on the same machine? why?

On Tue, Oct 18, 2011 at 10:51 PM, Gustavo Maia gustavobbmaia@gmail.comwrote:

So, I'd better install the machine (m1.xlarge) 4 ES, an ES for each point
data to different HD. It should be better because at the time of the search
are going to be in parallel searches using 4 hds, with different processors.
I set up each instance of ES with 3GB of ram.

If I use the machine (m1.large) I would install only 2 ES, one for each
allocate the same HD and 3GB of ram for each ES.

is it?

****** m1.xlarge Config

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge

****** m1.large Config:

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

//###################################################################

2011/10/17 Shay Banon kimchy@gmail.com:

Heya,
Its tricky to choose between c1.xlarge (more CPU) and m1.xlarge (more
memory). I suggest going with the m1.xlarge as more memory tend
to outweigh faster CPU.
Regarding the drives. the new option to specify multiple data
locations
does not depend on the number of shards. In other words, even a singel
shard
allocated on a node will make use of all the data locations.
-shay.banon

On Mon, Oct 17, 2011 at 7:57 PM, Gustavo Maia gustavobbmaia@gmail.com
wrote:

Thank you.

I'm thinking of using 10 instances c1.xlarge. Each instance
(c1.xlarge) has 4 hds.

So, I would use the appropriate version of the v0.18 elasticsearch,
which allows configure 4 hds in the same elasticsearch process. Is
that best ?

Would have any peformance problem, if I have each shard of size 15GB,
thinking that in each instance would have 4 shards, one per HD?

2011/10/14 Shay Banon kimchy@gmail.com:

In general, I suggest using the xlarge instances in Amazon, simply
because
of the higher IO they provide and better performance consistency (at
least
based on what users have seen).

On Thu, Oct 13, 2011 at 3:09 PM, Gustavo Maia <
gustavobbmaia@gmail.com>

wrote:

Hi Pavel,
Thanks for all.

How many large instance do you have ?

2011/10/13 Pavel Penchev pavel.penchev@gmail.com:

Hi,

we have similar requirements and we decided to go for the large
instances.
The search times were ok on the small instances (90% below 200ms)
but

the
indexing suffered significantly (only 30% below 200ms, we have
requirements
for indexing as well). In comparison the large instances handle
both

search
and indexing with 95% below 200ms.

Bear in mind this is specific to the type of documents you have and
the
searches you perform. Go for a 24h test I'd suggest.

Regards,
Pavel

On 13.10.2011 01:26, Gustavo Maia wrote:

correct:
My question would be to build a cluster of 40 SMALL or 15 LARGE
instance instance. I need to search back in less than 200ms on
average.

2011/10/12 Gustavo Maia gustavobbmaia@gmail.com:

First thank you for your attention.

In the amazon price of 4 small is the same price of one large.
The small is 32 bit and have only one hard drive. The large is 64
bits
and have two hard drive.

Today I have 300GB of index which is distributed in three machines
that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt
whether

this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15 small
instance instance. I need to search back in less than 200ms on
average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banon kimchy@gmail.com

Large instances are preferable, but, do you mean 1 large instance
Vs.

4
small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo gustavobbmaia@gmail.com
wrote:

for better peformance is better I use 4 small instance or use a
large

in
the amazon cloud?
Small instance is 32 bits with one hard drive instance, and large
have 2
hard drive instance and is 64 bits.

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia


(Gustavo Maia) #14

Yes, Is not it better?

For my experimenting with the lucene, is better distribute the load between
the drives. Using an ES for each hard drive, I guarantee a better
distribution between HD. Ex: One shard of 15GB per HD. During the seach i
will have better parallelism since I have one HD and one processor for a
specific search.

ex: When the User do a search we have the parallel processing of 4 hds and 4
processors, ensuring a faster response, since it was set up only one shard
by ES.

2011/10/18 Shay Banon kimchy@gmail.com

You mean start 3 ES processes on the same machine? why?

On Tue, Oct 18, 2011 at 10:51 PM, Gustavo Maia gustavobbmaia@gmail.com
wrote:

So, I'd better install the machine (m1.xlarge) 4 ES, an ES for each point
data to different HD. It should be better because at the time of the search
are going to be in parallel searches using 4 hds, with different processors.

I set up each instance of ES with 3GB of ram.

If I use the machine (m1.large) I would install only 2 ES, one for each
allocate the same HD and 3GB of ram for each ES.

is it?

****** m1.xlarge Config

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge

****** m1.large Config:

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

//###################################################################

2011/10/17 Shay Banon kimchy@gmail.com:

Heya,
Its tricky to choose between c1.xlarge (more CPU) and m1.xlarge
(more

memory). I suggest going with the m1.xlarge as more memory tend
to outweigh faster CPU.
Regarding the drives. the new option to specify multiple data
locations

does not depend on the number of shards. In other words, even a singel
shard

allocated on a node will make use of all the data locations.
-shay.banon

On Mon, Oct 17, 2011 at 7:57 PM, Gustavo Maia gustavobbmaia@gmail.com
wrote:

Thank you.

I'm thinking of using 10 instances c1.xlarge. Each instance
(c1.xlarge) has 4 hds.

So, I would use the appropriate version of the v0.18 elasticsearch,
which allows configure 4 hds in the same elasticsearch process. Is
that best ?

Would have any peformance problem, if I have each shard of size 15GB,
thinking that in each instance would have 4 shards, one per HD?

2011/10/14 Shay Banon kimchy@gmail.com:

In general, I suggest using the xlarge instances in Amazon, simply
because
of the higher IO they provide and better performance consistency (at
least
based on what users have seen).

On Thu, Oct 13, 2011 at 3:09 PM, Gustavo Maia <
gustavobbmaia@gmail.com>

wrote:

Hi Pavel,
Thanks for all.

How many large instance do you have ?

2011/10/13 Pavel Penchev pavel.penchev@gmail.com:

Hi,

we have similar requirements and we decided to go for the large
instances.
The search times were ok on the small instances (90% below 200ms)
but

the
indexing suffered significantly (only 30% below 200ms, we have
requirements
for indexing as well). In comparison the large instances handle
both

search
and indexing with 95% below 200ms.

Bear in mind this is specific to the type of documents you have
and

the
searches you perform. Go for a 24h test I'd suggest.

Regards,
Pavel

On 13.10.2011 01:26, Gustavo Maia wrote:

correct:
My question would be to build a cluster of 40 SMALL or 15 LARGE
instance instance. I need to search back in less than 200ms on
average.

2011/10/12 Gustavo Maia gustavobbmaia@gmail.com:

First thank you for your attention.

In the amazon price of 4 small is the same price of one large.
The small is 32 bit and have only one hard drive. The large is 64
bits
and have two hard drive.

Today I have 300GB of index which is distributed in three
machines

that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt
whether

this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15 small
instance instance. I need to search back in less than 200ms on
average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banon kimchy@gmail.com

Large instances are preferable, but, do you mean 1 large instance
Vs.

4
small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo <gustavobbmaia@gmail.com

wrote:

for better peformance is better I use 4 small instance or use a
large

in
the amazon cloud?
Small instance is 32 bits with one hard drive instance, and large
have 2
hard drive instance and is 64 bits.

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia


(Shay Banon) #15

In master version, you can specify several data locations so a single
instance can use several drives, I thought you were referring to that in
your previous mail.

On Tue, Oct 18, 2011 at 11:22 PM, Gustavo Maia gustavobbmaia@gmail.comwrote:

Yes, Is not it better?

For my experimenting with the lucene, is better distribute the load between
the drives. Using an ES for each hard drive, I guarantee a better
distribution between HD. Ex: One shard of 15GB per HD. During the seach i
will have better parallelism since I have one HD and one processor for a
specific search.

ex: When the User do a search we have the parallel processing of 4 hds and
4 processors, ensuring a faster response, since it was set up only one shard
by ES.

2011/10/18 Shay Banon kimchy@gmail.com

You mean start 3 ES processes on the same machine? why?

On Tue, Oct 18, 2011 at 10:51 PM, Gustavo Maia gustavobbmaia@gmail.com
wrote:

So, I'd better install the machine (m1.xlarge) 4 ES, an ES for each
point data to different HD. It should be better because at the time of the
search are going to be in parallel searches using 4 hds, with different
processors.

I set up each instance of ES with 3GB of ram.

If I use the machine (m1.large) I would install only 2 ES, one for each
allocate the same HD and 3GB of ram for each ES.

is it?

****** m1.xlarge Config

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge

****** m1.large Config:

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

//###################################################################

2011/10/17 Shay Banon kimchy@gmail.com:

Heya,
Its tricky to choose between c1.xlarge (more CPU) and m1.xlarge
(more

memory). I suggest going with the m1.xlarge as more memory tend
to outweigh faster CPU.
Regarding the drives. the new option to specify multiple data
locations

does not depend on the number of shards. In other words, even a singel
shard

allocated on a node will make use of all the data locations.
-shay.banon

On Mon, Oct 17, 2011 at 7:57 PM, Gustavo Maia <
gustavobbmaia@gmail.com>

wrote:

Thank you.

I'm thinking of using 10 instances c1.xlarge. Each instance
(c1.xlarge) has 4 hds.

So, I would use the appropriate version of the v0.18 elasticsearch,
which allows configure 4 hds in the same elasticsearch process. Is
that best ?

Would have any peformance problem, if I have each shard of size 15GB,
thinking that in each instance would have 4 shards, one per HD?

2011/10/14 Shay Banon kimchy@gmail.com:

In general, I suggest using the xlarge instances in Amazon, simply
because
of the higher IO they provide and better performance consistency
(at

least
based on what users have seen).

On Thu, Oct 13, 2011 at 3:09 PM, Gustavo Maia <
gustavobbmaia@gmail.com>

wrote:

Hi Pavel,
Thanks for all.

How many large instance do you have ?

2011/10/13 Pavel Penchev pavel.penchev@gmail.com:

Hi,

we have similar requirements and we decided to go for the large
instances.
The search times were ok on the small instances (90% below
200ms) but

the
indexing suffered significantly (only 30% below 200ms, we have
requirements
for indexing as well). In comparison the large instances handle
both

search
and indexing with 95% below 200ms.

Bear in mind this is specific to the type of documents you have
and

the
searches you perform. Go for a 24h test I'd suggest.

Regards,
Pavel

On 13.10.2011 01:26, Gustavo Maia wrote:

correct:
My question would be to build a cluster of 40 SMALL or 15 LARGE
instance instance. I need to search back in less than 200ms on
average.

2011/10/12 Gustavo Maia gustavobbmaia@gmail.com:

First thank you for your attention.

In the amazon price of 4 small is the same price of one large.
The small is 32 bit and have only one hard drive. The large is
64

bits
and have two hard drive.

Today I have 300GB of index which is distributed in three
machines

that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt
whether

this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15 small
instance instance. I need to search back in less than 200ms on
average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banon kimchy@gmail.com

Large instances are preferable, but, do you mean 1 large
instance Vs.

4
small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo <
gustavobbmaia@gmail.com>

wrote:

for better peformance is better I use 4 small instance or use a
large

in
the amazon cloud?
Small instance is 32 bits with one hard drive instance, and
large

have 2
hard drive instance and is 64 bits.

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia


(Gustavo Maia) #16

Using the version of the master, if I set 4 shard, I guarantee that I will
have a shard in each hd?
Like if i have four shard, each shard of 15GB, I guarantee you'll have one
shard of 15GB in each HD?

2011/10/18 Shay Banon kimchy@gmail.com

In master version, you can specify several data locations so a single
instance can use several drives, I thought you were referring to that in
your previous mail.

On Tue, Oct 18, 2011 at 11:22 PM, Gustavo Maia gustavobbmaia@gmail.com
wrote:

Yes, Is not it better?

For my experimenting with the lucene, is better distribute the load
between the drives. Using an ES for each hard drive, I guarantee a better
distribution between HD. Ex: One shard of 15GB per HD. During the seach i
will have better parallelism since I have one HD and one processor for a
specific search.

ex: When the User do a search we have the parallel processing of 4 hds
and 4 processors, ensuring a faster response, since it was set up only one
shard by ES.

2011/10/18 Shay Banon kimchy@gmail.com

You mean start 3 ES processes on the same machine? why?

On Tue, Oct 18, 2011 at 10:51 PM, Gustavo Maia gustavobbmaia@gmail.com
wrote:

So, I'd better install the machine (m1.xlarge) 4 ES, an ES for each
point data to different HD. It should be better because at the time of the
search are going to be in parallel searches using 4 hds, with different
processors.

I set up each instance of ES with 3GB of ram.

If I use the machine (m1.large) I would install only 2 ES, one for
each allocate the same HD and 3GB of ram for each ES.

is it?

****** m1.xlarge Config

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge

****** m1.large Config:

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

//###################################################################

2011/10/17 Shay Banon kimchy@gmail.com:

Heya,
Its tricky to choose between c1.xlarge (more CPU) and m1.xlarge
(more

memory). I suggest going with the m1.xlarge as more memory tend
to outweigh faster CPU.
Regarding the drives. the new option to specify multiple data
locations

does not depend on the number of shards. In other words, even a
singel shard

allocated on a node will make use of all the data locations.
-shay.banon

On Mon, Oct 17, 2011 at 7:57 PM, Gustavo Maia <
gustavobbmaia@gmail.com>

wrote:

Thank you.

I'm thinking of using 10 instances c1.xlarge. Each instance
(c1.xlarge) has 4 hds.

So, I would use the appropriate version of the v0.18 elasticsearch,
which allows configure 4 hds in the same elasticsearch process. Is
that best ?

Would have any peformance problem, if I have each shard of size
15GB,

thinking that in each instance would have 4 shards, one per HD?

2011/10/14 Shay Banon kimchy@gmail.com:

In general, I suggest using the xlarge instances in Amazon,
simply

because
of the higher IO they provide and better performance consistency
(at

least
based on what users have seen).

On Thu, Oct 13, 2011 at 3:09 PM, Gustavo Maia <
gustavobbmaia@gmail.com>

wrote:

Hi Pavel,
Thanks for all.

How many large instance do you have ?

2011/10/13 Pavel Penchev pavel.penchev@gmail.com:

Hi,

we have similar requirements and we decided to go for the
large

instances.
The search times were ok on the small instances (90% below
200ms) but

the
indexing suffered significantly (only 30% below 200ms, we have
requirements
for indexing as well). In comparison the large instances
handle both

search
and indexing with 95% below 200ms.

Bear in mind this is specific to the type of documents you
have and

the
searches you perform. Go for a 24h test I'd suggest.

Regards,
Pavel

On 13.10.2011 01:26, Gustavo Maia wrote:

correct:
My question would be to build a cluster of 40 SMALL or 15
LARGE

instance instance. I need to search back in less than 200ms on
average.

2011/10/12 Gustavo Maia gustavobbmaia@gmail.com:

First thank you for your attention.

In the amazon price of 4 small is the same price of one large.
The small is 32 bit and have only one hard drive. The large is
64

bits
and have two hard drive.

Today I have 300GB of index which is distributed in three
machines

that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt
whether

this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15
small

instance instance. I need to search back in less than 200ms on
average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banon kimchy@gmail.com

Large instances are preferable, but, do you mean 1 large
instance Vs.

4
small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo <
gustavobbmaia@gmail.com>

wrote:

for better peformance is better I use 4 small instance or use
a large

in
the amazon cloud?
Small instance is 32 bits with one hard drive instance, and
large

have 2
hard drive instance and is 64 bits.

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia


(Shay Banon) #17

No, as I explained before, using multi drives does not mean that each shard
will be on a single drive, it means that the files composing the Lucene
index will exist on different drives, so a single shard will span all drives
potentially.

On Tue, Oct 18, 2011 at 11:38 PM, Gustavo Maia gustavobbmaia@gmail.comwrote:

Using the version of the master, if I set 4 shard, I guarantee that I will
have a shard in each hd?
Like if i have four shard, each shard of 15GB, I guarantee you'll have one
shard of 15GB in each HD?

2011/10/18 Shay Banon kimchy@gmail.com

In master version, you can specify several data locations so a single
instance can use several drives, I thought you were referring to that in
your previous mail.

On Tue, Oct 18, 2011 at 11:22 PM, Gustavo Maia gustavobbmaia@gmail.com
wrote:

Yes, Is not it better?

For my experimenting with the lucene, is better distribute the load
between the drives. Using an ES for each hard drive, I guarantee a better
distribution between HD. Ex: One shard of 15GB per HD. During the seach i
will have better parallelism since I have one HD and one processor for a
specific search.

ex: When the User do a search we have the parallel processing of 4 hds
and 4 processors, ensuring a faster response, since it was set up only one
shard by ES.

2011/10/18 Shay Banon kimchy@gmail.com

You mean start 3 ES processes on the same machine? why?

On Tue, Oct 18, 2011 at 10:51 PM, Gustavo Maia <
gustavobbmaia@gmail.com> wrote:

So, I'd better install the machine (m1.xlarge) 4 ES, an ES for each
point data to different HD. It should be better because at the time of the
search are going to be in parallel searches using 4 hds, with different
processors.

I set up each instance of ES with 3GB of ram.

If I use the machine (m1.large) I would install only 2 ES, one for
each allocate the same HD and 3GB of ram for each ES.

is it?

****** m1.xlarge Config

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge

****** m1.large Config:

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

//###################################################################

2011/10/17 Shay Banon kimchy@gmail.com:

Heya,
Its tricky to choose between c1.xlarge (more CPU) and m1.xlarge
(more

memory). I suggest going with the m1.xlarge as more memory tend
to outweigh faster CPU.
Regarding the drives. the new option to specify multiple data
locations

does not depend on the number of shards. In other words, even a
singel shard

allocated on a node will make use of all the data locations.
-shay.banon

On Mon, Oct 17, 2011 at 7:57 PM, Gustavo Maia <
gustavobbmaia@gmail.com>

wrote:

Thank you.

I'm thinking of using 10 instances c1.xlarge. Each instance
(c1.xlarge) has 4 hds.

So, I would use the appropriate version of the v0.18
elasticsearch,

which allows configure 4 hds in the same elasticsearch process.
Is

that best ?

Would have any peformance problem, if I have each shard of size
15GB,

thinking that in each instance would have 4 shards, one per HD?

2011/10/14 Shay Banon kimchy@gmail.com:

In general, I suggest using the xlarge instances in Amazon,
simply

because
of the higher IO they provide and better performance consistency
(at

least
based on what users have seen).

On Thu, Oct 13, 2011 at 3:09 PM, Gustavo Maia <
gustavobbmaia@gmail.com>

wrote:

Hi Pavel,
Thanks for all.

How many large instance do you have ?

2011/10/13 Pavel Penchev pavel.penchev@gmail.com:

Hi,

we have similar requirements and we decided to go for the
large

instances.
The search times were ok on the small instances (90% below
200ms) but

the
indexing suffered significantly (only 30% below 200ms, we
have

requirements
for indexing as well). In comparison the large instances
handle both

search
and indexing with 95% below 200ms.

Bear in mind this is specific to the type of documents you
have and

the
searches you perform. Go for a 24h test I'd suggest.

Regards,
Pavel

On 13.10.2011 01:26, Gustavo Maia wrote:

correct:
My question would be to build a cluster of 40 SMALL or 15
LARGE

instance instance. I need to search back in less than 200ms
on

average.

2011/10/12 Gustavo Maia gustavobbmaia@gmail.com:

First thank you for your attention.

In the amazon price of 4 small is the same price of one
large.

The small is 32 bit and have only one hard drive. The large
is 64

bits
and have two hard drive.

Today I have 300GB of index which is distributed in three
machines

that each machine has 6 15k rpm hard drive.
And doing this study was to migrate to the Amazon. So I doubt
whether

this is best 4 small or 1 large.

My question would be to build a cluster of 40 large or 15
small

instance instance. I need to search back in less than 200ms
on

average.
Is it possible to do this using elasticsearch at amazon?

Thanks for all. Elasticsearch is great project.

2011/10/12 Shay Banon kimchy@gmail.com

Large instances are preferable, but, do you mean 1 large
instance Vs.

4
small instances?

On Sat, Oct 8, 2011 at 11:16 PM, Gustavo <
gustavobbmaia@gmail.com>

wrote:

for better peformance is better I use 4 small instance or use
a large

in
the amazon cloud?
Small instance is 32 bits with one hard drive instance, and
large

have 2
hard drive instance and is 64 bits.

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia

--
Gustavo Maia


(system) #18