Cannot reduce number of segments during indexing

cvarano · September 3, 2023, 5:58pm

I have followed the advice in the aKNN tuning guide:

But no matter the settings, the indexing process still creates a huge tail of tiny segments.
Setup:

New dev deployment
Zero search traffic
64GB, CPU optimized
"indices.memory.index_buffer_size": "10%"
"index.translog.flush_threshold_size": "10gb"
"index.refresh_interval": "-1"

On a 64GB node, I understand half is allocated to the JVM heap, so we have 32GB of memory. If index_buffer_size is 10%, then that should be 3.2GB of buffer.

With these settings, how in the world do I end up with literally dozens of segments that are <100MB ???

Here's an example of one shard's segments:

s segment     size docs.count
0 _co2     510.6kb         31
0 _cou     716.3kb         44
0 _coq       1.7mb        111
0 _cot       3.1mb        199
0 _cop       5.5mb        354
0 _cnx       5.8mb        376
0 _co7       7.3mb        471
0 _co6       8.4mb        542
0 _cor      11.7mb       1986
0 _co1      12.2mb       1972
0 _cos      13.7mb        881
0 _cog      14.1mb        907
0 _cnv      15.3mb        981
0 _coo      18.3mb       1175
0 _cow      20.2mb       1296
0 _com      31.6mb       2030
0 _cov      40.1mb       2577
0 _cod      81.5mb       5234
0 _coe      84.8mb       5453
0 _col      87.7mb       5641
0 _cmv     106.6mb       6852
0 _cok     109.5mb       7040
0 _cja       131mb       8419
0 _coj     150.7mb      30500
0 _cob     335.3mb      54380
0 _cjr       584mb      48689
0 _cmn     707.9mb      59709
0 _cgn     732.6mb     198037
0 _ckb     868.9mb     241883
0 _c8x     962.4mb      90502
0 _c60    1021.7mb      97412
0 _cn6       1.3gb     377820
0 _2ut       1.4gb     393405
0 _a3v       1.5gb     438436
0 _bcu       1.6gb     469321
0 _7e4       1.6gb     483486
0 _2oa       1.7gb     492613
0 _8rc       1.7gb     500500
0 _32p       1.8gb     499522
0 _a7z       1.9gb     546017
0 _bzy       1.9gb     556732
0 _73j         2gb     580960
0 _4o6         2gb     592797
0 _6k3       2.3gb     660575
0 _9t0       2.4gb     676770
0 _5lu       2.5gb     723636
0 _7oi       2.6gb     752804
0 _57c       2.6gb     756930
0 _aq8       2.6gb     762371
0 _96j       2.7gb     776362
0 _3k1       2.8gb     795040
0 _ccf       2.8gb     816245
0 _wp        3.1gb     889646
0 _axy       3.1gb     892622
0 _bng       3.3gb     921377
0 _xc        3.4gb     972174
0 _9ai       3.8gb    1103511
0 _628         4gb    1142051
0 _310       4.2gb    1199113
0 _agl       4.5gb    1284416
0 _9td       4.5gb    1299733
0 _67z       4.6gb    1317589
0 _86m       4.6gb    1316734
0 _4xf       4.7gb    1353413
0 _c12       4.7gb    1351662
0 _75i       4.8gb    1362717
0 _bcn       4.8gb    1372132
0 _428       4.9gb    1394784
0 _83p       4.9gb    1404350

Christian_Dahlqvist · September 7, 2023, 8:45am

How are you indexing into Elasticsearch? It is a long shot, but verify you are not passing in a refresh parameter when indexing, as this is one reason you could end up with a lot of small segments?

cvarano · September 7, 2023, 4:15pm

Great question! We're using the streaming_bulk helper from the python client:

streaming_bulk(
    client=es,
    actions=index_actions,
    chunk_size=150,
    max_retries=3,
    initial_backoff=1,
    yield_ok=False,
    raise_on_error=False,
    raise_on_exception=False,
)

And the index actions are constructed as:

{
    "_index": index_name,
    "_source": document_dict,
}

Looking back at this, our chunk_size is quite small (iirc this was a throughput issue, but worth reassessing). Regardless, with a large index_buffer_size, I don't think the chunk_size of our requests are relevant.

What do you think?

system · October 5, 2023, 4:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Segment Size Elasticsearch	18	7998	July 5, 2017
What effect does index_buffer_size have on performance? Elasticsearch	8	1641	July 5, 2017
ElasticSearch segment size too small Elasticsearch	2	1279	September 6, 2017
Merge/segment understanding Elasticsearch	3	649	July 6, 2017
ES creating thousands of segments with 1 document each Elasticsearch	5	900	July 5, 2017

Cannot reduce number of segments during indexing

Related topics