Hi Shawn,
On Thu, Jan 31, 2013 at 4:58 PM, Shawn Ritchie xritchie@gmail.com wrote:
Hi Radu,
Thanks for the reply this was extremely interesting, regarding the slow
indexing i m running this locally on my development machine which has 4GB
of RAM and allocating 1GB for Elastic search and as you said i can see a
high amount of I/O and CPU usage. I was just testing stuff before i try
them out on the actual server.
If you run the indexing tests on an empty index, 1GB of RAM should be OK.
Otherwise, I'd increase the memory to 2GB. And of course your indexing
performance will decrease badly when there's not enough memory.
So the server has 128GB of RAM
So i shoud allocate 64GB to Elastic search but how much should i allocate
for index_buffer_size?
also would it be ideal to allocate lets say min_index_buffer_size 10%
and max_index_buffer_size 50%
Or would it be ideal to put index_buffer_size to something like 50%
Or to the other extreme put indices.memory.min_shard_index_buffer_size
10% (which would imply a total usage of roughly 50%)
I can't suggest exact figures, but I think you'd get close to the sweetspot
when you run some performance testing on the production hardware. I'd start
with 30GB heap for ES, and index_buffer_size of 20%, run some tests and see
the performance impact when changing those settings up and down.
Also as regards to bulk updating do you suggest i turn off
the refresh_interval bulk insert and turn it back on then run an optimize
with segment = 5?
I'd turn off refresh_interval if I wouldn't be interested in making the new
contents available for search until I finish the whole insert operation.
I'd use the optimize API only if you don't index until the next large
indexing operation (after which you'd optimize again). That's because if
you index afterwards and some merging will occur, your caches will get
invalidated, which has a big impact on query performance.
As Regards segments_per_tier what is being referred to by tier?
Basically, tiers are categories of segments by size. Here's my
understanding of the "tiered" merging policy:
- you have a number of very little segments that are naturally created
during indexing. Actually, some merging is done here to ensure segments are
bigger than "floor_segment"
- when that number hits segments_per_tier, ES will merge some of them into
bigger ones that will create some indices in the next "tier"
- the process repeats until that next tier hits segments_per_tier as well.
Then merging happens on that tier too, which creates another tier and so on
- it will stop creating new tiers when merging on the last tier will create
segments larger than max_merged_segment
So basically the more segments_per_tier you have, the less merging ->
because you'll end up with more small segments, since lower "tiers" will
hit the limit later.
and also what would be the ideal number to maximise insert speeds?
Again, unfortunately I can't recommend some hard numbers, but you can get
to them while testing.
Also once bulk inserting has been completed can i dan retweek these
setting to increase search speed instead of insert speed?
Yes, you can change merge settings on the fly via the Update Settings API:
Best regards,
Radu
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.