I think even two should be enough. Since you have a single client indexing,
the question is how you can parallelize it (even if its on a single process,
consider using threads). I have a feeling that you might bottleneck on the
client side before you bottleneck on elasticsearch side. If you see that you
client can push more than elasticsearch can handle, then it make sense to
add another machine.
If you are using a large instance, make sure that you set the -Xmx parameter
to a higher value (by default it is -Xmx1g) so elasticsearch will make sure
of more memory available on the machine.
On Fri, Mar 26, 2010 at 5:09 PM, timrobertson100
I am about to index about 200 million records from a tab delimited
file of 23-40 properties per line (most of them indexed). Probably the
data will be 150GB in JSON.
Before I start, does anyone have a feel for what instance types and
how many they'd guess at (single client throughput only right now).
Would 3 large instances (7.5GB memory) do me or would I be better with
a bunch of smaller ones?