Setting up Elastic search on EC2 - size and number?

timrobertson100 · March 26, 2010, 2:09pm

Hey,

I am about to index about 200 million records from a tab delimited
file of 23-40 properties per line (most of them indexed). Probably the
data will be 150GB in JSON.

Before I start, does anyone have a feel for what instance types and
how many they'd guess at (single client throughput only right now).
Would 3 large instances (7.5GB memory) do me or would I be better with
a bunch of smaller ones?

Cheers,
Tim

kimchy · March 26, 2010, 2:55pm

I think even two should be enough. Since you have a single client indexing,
the question is how you can parallelize it (even if its on a single process,
consider using threads). I have a feeling that you might bottleneck on the
client side before you bottleneck on elasticsearch side. If you see that you
client can push more than elasticsearch can handle, then it make sense to
add another machine.

If you are using a large instance, make sure that you set the -Xmx parameter
to a higher value (by default it is -Xmx1g) so elasticsearch will make sure
of more memory available on the machine.

-shay.banon

On Fri, Mar 26, 2010 at 5:09 PM, timrobertson100
timrobertson100@gmail.comwrote:

Hey,

I am about to index about 200 million records from a tab delimited
file of 23-40 properties per line (most of them indexed). Probably the
data will be 150GB in JSON.

Before I start, does anyone have a feel for what instance types and
how many they'd guess at (single client throughput only right now).
Would 3 large instances (7.5GB memory) do me or would I be better with
a bunch of smaller ones?

Cheers,
Tim

timrobertson100 · March 26, 2010, 5:06pm

Thanks Shay, I'll try 2 and see the performance.

I expect for any decent response times on search I will need to get the
indexes in memory so will expect more are needed later.

On Fri, Mar 26, 2010 at 3:55 PM, Shay Banon shay.banon@elasticsearch.comwrote:

I think even two should be enough. Since you have a single client indexing,
the question is how you can parallelize it (even if its on a single process,
consider using threads). I have a feeling that you might bottleneck on the
client side before you bottleneck on elasticsearch side. If you see that you
client can push more than elasticsearch can handle, then it make sense to
add another machine.

If you are using a large instance, make sure that you set the -Xmx
parameter to a higher value (by default it is -Xmx1g) so elasticsearch will
make sure of more memory available on the machine.

-shay.banon

On Fri, Mar 26, 2010 at 5:09 PM, timrobertson100 <
timrobertson100@gmail.com> wrote:

Hey,

I am about to index about 200 million records from a tab delimited
file of 23-40 properties per line (most of them indexed). Probably the
data will be 150GB in JSON.

Before I start, does anyone have a feel for what instance types and
how many they'd guess at (single client throughput only right now).
Would 3 large instances (7.5GB memory) do me or would I be better with
a bunch of smaller ones?

Cheers,
Tim

kimchy · March 26, 2010, 5:09pm

Do you mean storing the index in memory? It really depends on the FS
performance of amazon, I guess, but on local disks (not virtualized) you
will be surprised at the performance. If you get to compare it, it will be
interesting to hear...

-shay.banon

On Fri, Mar 26, 2010 at 8:06 PM, Tim Robertson timrobertson100@gmail.comwrote:

Thanks Shay, I'll try 2 and see the performance.

I expect for any decent response times on search I will need to get the
indexes in memory so will expect more are needed later.

On Fri, Mar 26, 2010 at 3:55 PM, Shay Banon shay.banon@elasticsearch.comwrote:

I think even two should be enough. Since you have a single client
indexing, the question is how you can parallelize it (even if its on a
single process, consider using threads). I have a feeling that you might
bottleneck on the client side before you bottleneck on elasticsearch side.
If you see that you client can push more than elasticsearch can handle, then
it make sense to add another machine.

If you are using a large instance, make sure that you set the -Xmx
parameter to a higher value (by default it is -Xmx1g) so elasticsearch will
make sure of more memory available on the machine.

-shay.banon

On Fri, Mar 26, 2010 at 5:09 PM, timrobertson100 <
timrobertson100@gmail.com> wrote:

Hey,

I am about to index about 200 million records from a tab delimited
file of 23-40 properties per line (most of them indexed). Probably the
data will be 150GB in JSON.

Before I start, does anyone have a feel for what instance types and
how many they'd guess at (single client throughput only right now).
Would 3 large instances (7.5GB memory) do me or would I be better with
a bunch of smaller ones?

Cheers,
Tim

Paul_Loy · March 30, 2010, 1:00pm

Hi Tim,

I am very interested to learn how your experiment went/is going. I'm leading
the development of an internal middleware solution which must work both in a
traditional hosted environment and AWS/EC2. Being able to run Elastic Search
on EC2 will help my tech selection efforts.

Any info would be great.

Many thanks,

Paul.

On Fri, Mar 26, 2010 at 6:09 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Do you mean storing the index in memory? It really depends on the FS
performance of amazon, I guess, but on local disks (not virtualized) you
will be surprised at the performance. If you get to compare it, it will be
interesting to hear...

-shay.banon

On Fri, Mar 26, 2010 at 8:06 PM, Tim Robertson timrobertson100@gmail.comwrote:

Thanks Shay, I'll try 2 and see the performance.

I expect for any decent response times on search I will need to get the
indexes in memory so will expect more are needed later.

On Fri, Mar 26, 2010 at 3:55 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

I think even two should be enough. Since you have a single client
indexing, the question is how you can parallelize it (even if its on a
single process, consider using threads). I have a feeling that you might
bottleneck on the client side before you bottleneck on elasticsearch side.
If you see that you client can push more than elasticsearch can handle, then
it make sense to add another machine.

If you are using a large instance, make sure that you set the -Xmx
parameter to a higher value (by default it is -Xmx1g) so elasticsearch will
make sure of more memory available on the machine.

-shay.banon

On Fri, Mar 26, 2010 at 5:09 PM, timrobertson100 <
timrobertson100@gmail.com> wrote:

Hey,

I am about to index about 200 million records from a tab delimited
file of 23-40 properties per line (most of them indexed). Probably the
data will be 150GB in JSON.

Before I start, does anyone have a feel for what instance types and
how many they'd guess at (single client throughput only right now).
Would 3 large instances (7.5GB memory) do me or would I be better with
a bunch of smaller ones?

Cheers,
Tim

--

Paul Loy
paul@keteracel.com
http://www.keteracel.com/paul

Paolo_Castagna · March 30, 2010, 1:04pm

Paul Loy wrote:

I am very interested to learn how your experiment went/is going. I'm
leading the development of an internal middleware solution which must
work both in a traditional hosted environment and AWS/EC2. Being able to
run Elastic Search on EC2 will help my tech selection efforts.

Any info would be great.

+1

Thanks,
Paolo

timrobertson100 · March 30, 2010, 1:33pm

The short answer is I got sidetracked... it is on the list of things
to do and I will share all findings.
For sure it will work, I am just curious how $ it becomes for decent throughput.

On Tue, Mar 30, 2010 at 3:04 PM, Paolo Castagna
castagna.lists@googlemail.com wrote:

Paul Loy wrote:

I am very interested to learn how your experiment went/is going. I'm
leading the development of an internal middleware solution which must work
both in a traditional hosted environment and AWS/EC2. Being able to run
Elastic Search on EC2 will help my tech selection efforts.

Any info would be great.

+1

Thanks,
Paolo

Topic		Replies	Views
[Scaling elastic server] how much load can elastic search handle? Elasticsearch	4	1686	July 6, 2017
How much data can one Standalone instance handle? Elasticsearch	6	4651	September 8, 2017
Ec2 instances for AWS ElasticSearch with good write performance and storage Elasticsearch	3	738	January 18, 2019
Quick question on which Amazon instance type to use Elasticsearch	1	366	July 6, 2017
Elasticsearch on EC2. What kind of instance types to use? Elasticsearch	7	17646	July 6, 2017

Setting up Elastic search on EC2 - size and number?

--

Related topics