How are you indexing the data? Is it using bulk requests, or one doc at a
time? If you are using Java, is it a Node Client or a Transport Client?
Also, is it a single indexing process/thread?
On Mon, Jul 2, 2012 at 11:48 PM, David Pilato david@pilato.fr wrote:
Heya****
I’m playing with ES to adjust some parameters for my needs and here is
what I can see.****
I’m running on Ubuntu 64 bits, openJDK1.7.0_03 with elasticsearch 0.19.4**
**
Here is my scenario:****
I inject “small documents” (less than 10 fields).****
I inject 1 000 000 docs.****
I run one search query (with facet).****
First test : Only one node and one shard with no replica**
· Injection time : 155 s
· Search time : 972 ms
Second test : Five nodes (but on the same hardware) and 5 shards with
no replica****
· Injection time : 444 s
· Search time : 346 ms
As I can see, search time is better with multiple shards (nodes). That’s
what I was expected.****
But indexation time is worst (3 times) with multiple shards (nodes). At
least, I was thinking that I would get the same result…****
What do you think ?****
Is it because I’m running on the same hardware (file system) ? So I have
some IOWaits ?****
Should I expect better results when running five nodes on different boxes
or at least different hard disks ?****
That’s not an issue. I can wait for 8 minutes to inject 1 million docs
How are you indexing the data? Is it using bulk requests, or one doc at a time? If you are using Java, is it a Node Client or a Transport Client? Also, is it a single indexing process/thread?
On Mon, Jul 2, 2012 at 11:48 PM, David Pilato david@pilato.fr wrote:
Heya
I’m playing with ES to adjust some parameters for my needs and here is what I can see.
I’m running on Ubuntu 64 bits, openJDK1.7.0_03 with elasticsearch 0.19.4
Here is my scenario:
I inject “small documents” (less than 10 fields).
I inject 1 000 000 docs.
I run one search query (with facet).
First test : Only one node and one shard with no replica
· Injection time : 155 s
· Search time : 972 ms
Second test : Five nodes (but on the same hardware) and 5 shards with no replica
· Injection time : 444 s
· Search time : 346 ms
As I can see, search time is better with multiple shards (nodes). That’s what I was expected.
But indexation time is worst (3 times) with multiple shards (nodes). At least, I was thinking that I would get the same result…
What do you think ?
Is it because I’m running on the same hardware (file system) ? So I have some IOWaits ?
Should I expect better results when running five nodes on different boxes or at least different hard disks ?
That’s not an issue. I can wait for 8 minutes to inject 1 million docs
So, what happens is that when you move form 1 node to 5 nodes, you pay the
price of network between nodes. With 5 nodes, you hit one node, and it
needs to send it over to another node. In a single node case, it does not
do the extra network hop. On the other hand, if you had more processes
indexing data, 5 nodes would speed the time to index.
On Tue, Jul 3, 2012 at 7:03 AM, David Pilato david@pilato.fr wrote:
One doc at a time.
Using NodeJS in a single Thread.
How are you indexing the data? Is it using bulk requests, or one doc at a
time? If you are using Java, is it a Node Client or a Transport Client?
Also, is it a single indexing process/thread?
On Mon, Jul 2, 2012 at 11:48 PM, David Pilato david@pilato.fr wrote:
Heya****
I’m playing with ES to adjust some parameters for my needs and here is
what I can see.****
I’m running on Ubuntu 64 bits, openJDK1.7.0_03 with elasticsearch 0.19.4*
Here is my scenario:****
I inject “small documents” (less than 10 fields).****
I inject 1 000 000 docs.****
I run one search query (with facet).****
First test : Only one node and one shard with no replica**
· Injection time : 155 s
· Search time : 972 ms
Second test : Five nodes (but on the same hardware) and 5 shards with
no replica****
· Injection time : 444 s
· Search time : 346 ms
As I can see, search time is better with multiple shards (nodes). That’s
what I was expected.****
But indexation time is worst (3 times) with multiple shards (nodes). At
least, I was thinking that I would get the same result…****
What do you think ?****
Is it because I’m running on the same hardware (file system) ? So I have
some IOWaits ?****
Should I expect better results when running five nodes on different boxes
or at least different hard disks ?****
That’s not an issue. I can wait for 8 minutes to inject 1 million docs
Additionally, it you use bulk index, then you would spread the load as well
from a single client.
On Tue, Jul 3, 2012 at 11:55 AM, Shay Banon kimchy@gmail.com wrote:
So, what happens is that when you move form 1 node to 5 nodes, you pay the
price of network between nodes. With 5 nodes, you hit one node, and it
needs to send it over to another node. In a single node case, it does not
do the extra network hop. On the other hand, if you had more processes
indexing data, 5 nodes would speed the time to index.
On Tue, Jul 3, 2012 at 7:03 AM, David Pilato david@pilato.fr wrote:
One doc at a time.
Using NodeJS in a single Thread.
How are you indexing the data? Is it using bulk requests, or one doc at a
time? If you are using Java, is it a Node Client or a Transport Client?
Also, is it a single indexing process/thread?
On Mon, Jul 2, 2012 at 11:48 PM, David Pilato david@pilato.fr wrote:
Heya****
I’m playing with ES to adjust some parameters for my needs and here is
what I can see.****
I’m running on Ubuntu 64 bits, openJDK1.7.0_03 with elasticsearch 0.19.4
Here is my scenario:****
I inject “small documents” (less than 10 fields).****
I inject 1 000 000 docs.****
I run one search query (with facet).****
First test : Only one node and one shard with no replica**
· Injection time : 155 s
· Search time : 972 ms
Second test : Five nodes (but on the same hardware) and 5 shards with
no replica****
· Injection time : 444 s
· Search time : 346 ms
As I can see, search time is better with multiple shards (nodes). That’s
what I was expected.****
But indexation time is worst (3 times) with multiple shards (nodes). At
least, I was thinking that I would get the same result…****
What do you think ?****
Is it because I’m running on the same hardware (file system) ? So I have
some IOWaits ?****
Should I expect better results when running five nodes on different
boxes or at least different hard disks ?****
That’s not an issue. I can wait for 8 minutes to inject 1 million docs
;-)****
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.