Bulk indexing: single segment per shard

John16 · January 29, 2019, 11:28am

Hello,

I need to bulk index a large number of documents starting with empty index. I know I can do a (long-running) _forcemerge later to reduce number of segments to 1, but I wonder if there is a settings so I can get an index with a single segment without this step?

I am indexing on a separate server and I want to get index fully optimized for fast searches.

Thanks!

DavidTurner · January 29, 2019, 1:03pm

Elasticsearch assumes that you're going to continue to index documents and therefore does not merge segments this enthusiastically. Over-merging (e.g. merging to a single segment on a shard that is still indexing documents) can cause performance issues, so Elasticsearch won't do it unless you specifically ask it to.

Why do you want to avoid a final _forcemerge after your indexing has finished?

John16 · January 29, 2019, 1:17pm

Because it takes very long and I know from the very beginning that I am starting with empty index, add ~100 millions of docs and then use index in read-only manner.

I want to get single-segment index as fast as possible.

DavidTurner · January 29, 2019, 1:36pm

I know of no easy way around having Elasticsearch write its data in multiple segments and then merge them together later.

If your documents do not fit into memory (specifically the indexing buffer) then Elasticsearch needs to write a segment each time this buffer fills up.

If your indexing generates a translog larger than index.translog.flush_threshold_size then Elasticsearch will perform a flush each time this threshold is reached.

You should be sure to follow the instructions on tuning for indexing speed since this will help to generate fewer segments. Particularly, ensure that you are not refreshing too frequently.

Are you sure that the final force merge is actually worth it? You seem to be trying to optimise the process of building a shard from scratch, which implies that you will be doing it quite often. How much extra search performance does it buy you?

system · February 26, 2019, 1:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Merge/segment understanding Elasticsearch	3	651	July 6, 2017
ES creating thousands of segments with 1 document each Elasticsearch	5	901	July 5, 2017
Does bulk indexing needs an optimize and the end? Elasticsearch	4	348	July 6, 2017
Bulk Indexing - Tier Merge Policy Elasticsearch	4	1452	July 6, 2017
Creating an Index with one Segment Elasticsearch	3	754	July 5, 2017

Bulk indexing: single segment per shard

Related topics