Thank you David,
@Don't expect any ETA on the mailing list...
Yes,Definitely you are right, I'm a little bit under pressure...
That's the exact way I've taken and these are my findings of running some
tests on a machine with these config(32g Ram,8 core and 143Mb/s hdd speed):
16g Ram for ES(Xmx=Xmn), bootstrap.mlockall: true, added jvm option:
-server, -XX:+AggressiveOpts
for adding 500,000 docs the results were:
shards: 5, 1684 docs/s
shards: 50, 2336 Doc/s
shards:100, 2558
I ran the tests with mores shards but the best for me was 100, so I decided
to set the shards no to 100 and try with other parameters.
Whats happened with more data: for 2.400.000 records the average speed was
1600 Doc/s so I started to creating new indexes after each 500,000 records
and the indexing speed increased to 2558 Doc/s.
Now the problem is that the final data is much more than these numbers and
ES after creating 20 index (let say 2000 shards) decreases so much which in
some cases creating an index takes 10 minutes.
Then I was thinking that this is the maximum speed and capacity of the ES
and I must add more hardware, so I asked here to make sure.
Best,
Vahid
On Mon, Sep 24, 2012 at 2:57 PM, David Pilato david@pilato.fr wrote:
**
Hey Vahid
Don't expect any ETA on the mailing list...
Yes that's what I was meaning.
What you can do, is probably to inject in a signle node (1 shard, 0
replica) and see how much docs one of your single node can handle?
Then, perform the same test with 2 shards, 0 replica on the same node.
Then, add a second node and perform the same test with 2 shards, 0
replica.
Then, perform the same test with 4 shards, 0 replica.
I think, you will be able to find the right numbers for your hardware.
BTW, give a try to the wonderful Bigdesk plugin. It will help you to find
some clues about IO, Memory, ...
David.
Le 24 septembre 2012 à 13:32, Vahid Hasani vhasani57@gmail.com a
écrit :
No reply?
On Mon, Sep 24, 2012 at 10:44 AM, Vahid Hasani vhasani57@gmail.comwrote:
Thank you David,
With 10 nodes you mean 10 ES instance on 10 machines? if so, certainly the
result would be much more better(more hardware resources). For me the
problem is that I have no measurement of ES performance.
At first we were running the tests on the cluster but in the cluster there
are lots of factors which affect the performance(like networking...) and
performance was not acceptable and finding the bottlenecks was difficult,
so I decided to find some measurements on a single node to make sure about
the ES and our indexing approach and then run the tests on the
cluster(anyway I do agree with you that it's not a linear scale approach).
I want to find a solution to avoid decreasing the indexing performance, so
I need to know that how many docs each shards can store without performance
problem and if the max capacity reached what should I do.
Vahid.
On Mon, Sep 24, 2012 at 9:59 AM, David Pilato david@pilato.fr wrote:
Problem is that with 100 shards on a single node (100 lucene instances
per node) will give you High IO requests. When your index is increasing
(more and more docs), Read and Write operations will cost you more.
I'm pretty sure that if you run the same test on 10 nodes (10 shards per
node - with replica=0), you will get best results.
What I want to say here is that it's really hard to make assumptions
based on what you can see on a single node. To tune ES, I recommand to do
it on the target platform.
It's not a linear scale approach.
David.
Le 24 septembre 2012 à 09:28, Vahid < vhasani57@gmail.com> a écrit :
Hi Jaideep, thanks for your reply.
I'm running one ES instance on a single node, first I want to make sure
about the approach to have max performance then I will apply the
configuration to the cluster.
By creating an index with only one shards soon the performance of data
indexing will be decreased, In addition, ES index creation speed get so
slow after creating about 2000 index(one shard).
During the last tests which I've run, one index with 100 shards gave me
the best performance for indexing 2.4 m records(doc size ~ 22kb and the
speed was 2550 rec/s), but for indexing more data the performance is
decreasing(final records count is 1 billion).
Thanks,
Vahid
On Monday, September 24, 2012 3:05:50 AM UTC+2, jaideep dhok wrote:
Vahid,
100 shards per index is too many. How about trying one shard per node in
the cluster?
Thanks,
Jaideep
On Mon, Sep 24, 2012 at 2:55 AM, Vahid vhas...@gmail.com wrote:
1,200,000
--
Jaideep Dhok
--
--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
--
--
--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
--
--