ElasticSearch(0.90) How to make big segment at first place

Prakash_Patidar · August 3, 2013, 1:59am

Hi,
We are using ElasticSearch(ES) to index large number of documents every day.
By default , ElasticSearch(Lucene) is creating smaller segments and it's
background threads makes them bigger based on merging policy.
To reduce merge cycles and have efficient segment at first place , i would
like ES to create bigger segment in memory and writes to disk(no probs if
it uses more memory and searches are available only after it flushes
segments to disk).
I knew Older version of Lucene 3.0 had setting *setMaxBufferedDocshttp://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/index/IndexWriter.html#setMaxBufferedDocs(int)
(*Determines the minimal number of documents required before the buffered
in-memory documents are flushed as a new Segment).

Can I do the same in ES where creating bigger segments(may be 100 MB or
bigger ) and reduce merge cycles(max 1 merge or will avoid merge and run
force merge at the end of day , as i am creating new index every day) to
reduce IO substantially.

Regards,
Prakash

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

simonw_2 · August 3, 2013, 7:05am

raising your refresh interval should help here. We flush every 3 sec by
default which creates lots of segments. if you set it to -1 you can control
it yourself by calling flush or refresh via the API. You should also look
at indices.memory.index_buffer_size (Elasticsearch Platform — Find real-time answers at scale | Elastic)
to control how much ram is used for doc buffering. Yet, Lucene 4 works
differently and doesn't merge everything in memory. you can use less
threads and it will create less segments. Note, for throughput it might be
better to write more but smaller segments though.

simon

On Saturday, August 3, 2013 3:59:30 AM UTC+2, Prakash Patidar wrote:

Hi,
We are using Elasticsearch(ES) to index large number of documents every
day.
By default , Elasticsearch(Lucene) is creating smaller segments and it's
background threads makes them bigger based on merging policy.
To reduce merge cycles and have efficient segment at first place , i would
like ES to create bigger segment in memory and writes to disk(no probs if
it uses more memory and searches are available only after it flushes
segments to disk).
I knew Older version of Lucene 3.0 had setting *setMaxBufferedDocshttp://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/index/IndexWriter.html#setMaxBufferedDocs(int)
(*Determines the minimal number of documents required before the
buffered in-memory documents are flushed as a new Segment).

Can I do the same in ES where creating bigger segments(may be 100 MB or
bigger ) and reduce merge cycles(max 1 merge or will avoid merge and run
force merge at the end of day , as i am creating new index every day) to
reduce IO substantially.

Regards,
Prakash

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elasticsearch Segment Size Elasticsearch	18	8021	July 5, 2017
ElasticSearch segment size too small Elasticsearch	2	1283	September 6, 2017
Merge/segment understanding Elasticsearch	3	652	July 6, 2017
About frequently index writing in elasticsearch cluster Elasticsearch	2	464	July 6, 2017
ES creating thousands of segments with 1 document each Elasticsearch	5	902	July 5, 2017

ElasticSearch(0.90) How to make big segment at first place

Related topics