ElasticSearch(0.90) How to make big segment at first place

Hi,
We are using ElasticSearch(ES) to index large number of documents every day.
By default , ElasticSearch(Lucene) is creating smaller segments and it's
background threads makes them bigger based on merging policy.
To reduce merge cycles and have efficient segment at first place , i would
like ES to create bigger segment in memory and writes to disk(no probs if
it uses more memory and searches are available only after it flushes
segments to disk).
I knew Older version of Lucene 3.0 had setting *setMaxBufferedDocshttp://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/index/IndexWriter.html#setMaxBufferedDocs(int)
(*Determines the minimal number of documents required before the buffered
in-memory documents are flushed as a new Segment).

Can I do the same in ES where creating bigger segments(may be 100 MB or
bigger ) and reduce merge cycles(max 1 merge or will avoid merge and run
force merge at the end of day , as i am creating new index every day) to
reduce IO substantially.

Regards,
Prakash

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

raising your refresh interval should help here. We flush every 3 sec by
default which creates lots of segments. if you set it to -1 you can control
it yourself by calling flush or refresh via the API. You should also look
at indices.memory.index_buffer_size (Elasticsearch Platform — Find real-time answers at scale | Elastic)
to control how much ram is used for doc buffering. Yet, Lucene 4 works
differently and doesn't merge everything in memory. you can use less
threads and it will create less segments. Note, for throughput it might be
better to write more but smaller segments though.

simon

On Saturday, August 3, 2013 3:59:30 AM UTC+2, Prakash Patidar wrote:

Hi,
We are using Elasticsearch(ES) to index large number of documents every
day.
By default , Elasticsearch(Lucene) is creating smaller segments and it's
background threads makes them bigger based on merging policy.
To reduce merge cycles and have efficient segment at first place , i would
like ES to create bigger segment in memory and writes to disk(no probs if
it uses more memory and searches are available only after it flushes
segments to disk).
I knew Older version of Lucene 3.0 had setting *setMaxBufferedDocshttp://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/index/IndexWriter.html#setMaxBufferedDocs(int)
(*Determines the minimal number of documents required before the
buffered in-memory documents are flushed as a new Segment).

Can I do the same in ES where creating bigger segments(may be 100 MB or
bigger ) and reduce merge cycles(max 1 merge or will avoid merge and run
force merge at the end of day , as i am creating new index every day) to
reduce IO substantially.

Regards,
Prakash

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.