Performance 1 shard / 1 node vs 5 shards / 5 nodes

Heya

I'm playing with ES to adjust some parameters for my needs and here is what
I can see.

I'm running on Ubuntu 64 bits, openJDK1.7.0_03 with elasticsearch 0.19.4

Here is my scenario:

I inject "small documents" (less than 10 fields).

I inject 1 000 000 docs.

I run one search query (with facet).

First test : Only one node and one shard with no replica

. Injection time : 155 s

. Search time : 972 ms

Second test : Five nodes (but on the same hardware) and 5 shards with no
replica

. Injection time : 444 s

. Search time : 346 ms

As I can see, search time is better with multiple shards (nodes). That's
what I was expected.

But indexation time is worst (3 times) with multiple shards (nodes). At
least, I was thinking that I would get the same result.

What do you think ?

Is it because I'm running on the same hardware (file system) ? So I have
some IOWaits ?

Should I expect better results when running five nodes on different boxes or
at least different hard disks ?

That's not an issue. I can wait for 8 minutes to inject 1 million docs :wink:

Thanks

David.

How are you indexing the data? Is it using bulk requests, or one doc at a
time? If you are using Java, is it a Node Client or a Transport Client?
Also, is it a single indexing process/thread?

On Mon, Jul 2, 2012 at 11:48 PM, David Pilato david@pilato.fr wrote:

Heya****



I’m playing with ES to adjust some parameters for my needs and here is
what I can see.****

I’m running on Ubuntu 64 bits, openJDK1.7.0_03 with elasticsearch 0.19.4**
**


Here is my scenario:****

I inject “small documents” (less than 10 fields).****

I inject 1 000 000 docs.****

I run one search query (with facet).****


First test : Only one node and one shard with no replica**

· Injection time : 155 s

· Search time : 972 ms


Second test : Five nodes (but on the same hardware) and 5 shards with
no replica****

· Injection time : 444 s

· Search time : 346 ms


As I can see, search time is better with multiple shards (nodes). That’s
what I was expected.****

But indexation time is worst (3 times) with multiple shards (nodes). At
least, I was thinking that I would get the same result…****


What do you think ?****

Is it because I’m running on the same hardware (file system) ? So I have
some IOWaits ?****

Should I expect better results when running five nodes on different boxes
or at least different hard disks ?****


That’s not an issue. I can wait for 8 minutes to inject 1 million docs :wink:



Thanks****

David.****

One doc at a time.
Using NodeJS in a single Thread.

--
David

Le 3 juil. 2012 à 00:59, Shay Banon kimchy@gmail.com a écrit :

How are you indexing the data? Is it using bulk requests, or one doc at a time? If you are using Java, is it a Node Client or a Transport Client? Also, is it a single indexing process/thread?

On Mon, Jul 2, 2012 at 11:48 PM, David Pilato david@pilato.fr wrote:
Heya

I’m playing with ES to adjust some parameters for my needs and here is what I can see.

I’m running on Ubuntu 64 bits, openJDK1.7.0_03 with elasticsearch 0.19.4

Here is my scenario:

I inject “small documents” (less than 10 fields).

I inject 1 000 000 docs.

I run one search query (with facet).

First test : Only one node and one shard with no replica

· Injection time : 155 s

· Search time : 972 ms

Second test : Five nodes (but on the same hardware) and 5 shards with no replica

· Injection time : 444 s

· Search time : 346 ms

As I can see, search time is better with multiple shards (nodes). That’s what I was expected.

But indexation time is worst (3 times) with multiple shards (nodes). At least, I was thinking that I would get the same result…

What do you think ?

Is it because I’m running on the same hardware (file system) ? So I have some IOWaits ?

Should I expect better results when running five nodes on different boxes or at least different hard disks ?

That’s not an issue. I can wait for 8 minutes to inject 1 million docs :wink:

Thanks

David.

So, what happens is that when you move form 1 node to 5 nodes, you pay the
price of network between nodes. With 5 nodes, you hit one node, and it
needs to send it over to another node. In a single node case, it does not
do the extra network hop. On the other hand, if you had more processes
indexing data, 5 nodes would speed the time to index.

On Tue, Jul 3, 2012 at 7:03 AM, David Pilato david@pilato.fr wrote:

One doc at a time.
Using NodeJS in a single Thread.

--
David

Le 3 juil. 2012 à 00:59, Shay Banon kimchy@gmail.com a écrit :

How are you indexing the data? Is it using bulk requests, or one doc at a
time? If you are using Java, is it a Node Client or a Transport Client?
Also, is it a single indexing process/thread?

On Mon, Jul 2, 2012 at 11:48 PM, David Pilato david@pilato.fr wrote:

Heya****



I’m playing with ES to adjust some parameters for my needs and here is
what I can see.****

I’m running on Ubuntu 64 bits, openJDK1.7.0_03 with elasticsearch 0.19.4*



Here is my scenario:****

I inject “small documents” (less than 10 fields).****

I inject 1 000 000 docs.****

I run one search query (with facet).****


First test : Only one node and one shard with no replica**

· Injection time : 155 s

· Search time : 972 ms


Second test : Five nodes (but on the same hardware) and 5 shards with
no replica****

· Injection time : 444 s

· Search time : 346 ms


As I can see, search time is better with multiple shards (nodes). That’s
what I was expected.****

But indexation time is worst (3 times) with multiple shards (nodes). At
least, I was thinking that I would get the same result…****


What do you think ?****

Is it because I’m running on the same hardware (file system) ? So I have
some IOWaits ?****

Should I expect better results when running five nodes on different boxes
or at least different hard disks ?****


That’s not an issue. I can wait for 8 minutes to inject 1 million docs :wink:



Thanks****

David.****

Additionally, it you use bulk index, then you would spread the load as well
from a single client.

On Tue, Jul 3, 2012 at 11:55 AM, Shay Banon kimchy@gmail.com wrote:

So, what happens is that when you move form 1 node to 5 nodes, you pay the
price of network between nodes. With 5 nodes, you hit one node, and it
needs to send it over to another node. In a single node case, it does not
do the extra network hop. On the other hand, if you had more processes
indexing data, 5 nodes would speed the time to index.

On Tue, Jul 3, 2012 at 7:03 AM, David Pilato david@pilato.fr wrote:

One doc at a time.
Using NodeJS in a single Thread.

--
David

Le 3 juil. 2012 à 00:59, Shay Banon kimchy@gmail.com a écrit :

How are you indexing the data? Is it using bulk requests, or one doc at a
time? If you are using Java, is it a Node Client or a Transport Client?
Also, is it a single indexing process/thread?

On Mon, Jul 2, 2012 at 11:48 PM, David Pilato david@pilato.fr wrote:

Heya****



I’m playing with ES to adjust some parameters for my needs and here is
what I can see.****

I’m running on Ubuntu 64 bits, openJDK1.7.0_03 with elasticsearch 0.19.4



Here is my scenario:****

I inject “small documents” (less than 10 fields).****

I inject 1 000 000 docs.****

I run one search query (with facet).****


First test : Only one node and one shard with no replica**

· Injection time : 155 s

· Search time : 972 ms


Second test : Five nodes (but on the same hardware) and 5 shards with
no replica****

· Injection time : 444 s

· Search time : 346 ms


As I can see, search time is better with multiple shards (nodes). That’s
what I was expected.****

But indexation time is worst (3 times) with multiple shards (nodes). At
least, I was thinking that I would get the same result…****


What do you think ?****

Is it because I’m running on the same hardware (file system) ? So I have
some IOWaits ?****

Should I expect better results when running five nodes on different
boxes or at least different hard disks ?****


That’s not an issue. I can wait for 8 minutes to inject 1 million docs
;-)****


Thanks****

David.****