ES creating thousands of segments with 1 document each

Herve_Bry · October 8, 2015, 1:52pm

Hello Community,

I am calling for help as we are struggling with a strange problem: often when indexing a bulk of data, thousands of tiny segments with only one doc in each are created. This brings the cluster to its knees by consuming all the CPU on the impacted nodes, often disrupting the service.

Here is the situation:

We have 3 servers (16 cores/1.2TB SSD/128GB RAM each) with 3 instances of ES 1.7.1 each (24G RAM per instance)
Our index holds 1.6 Billion records in 10 shards with 1 replica, totalizing 1.2TB on disk
This makes about 700 segments in normal conditions
It is queried at about 150-200 queries per second
Every hour, a few millions of records are added or updated, in bulk mode, using 8 parallel connections. This takes between 5 and 20 min.

The problem arise every hour during the bulk indexing. During the first few minutes, a huge number of segments are created (I have seen up to 6000) that hold only one document each. After that, the ES instance holding those segments use so much CPU that it is almost unresponsive, which slows own the other instances on the same server and sometimes even disrupts the service.

After a few minutes of very heavy CPU usage, the segments are finally merged (the count goes down to a normal ~ 700) and everything goes back to normal.

Is this a bug ? It appears to me that a segment should rarely hold only one doc...
Do you have advice on what settings to tune to avoid this problem ? We have already tried different refresh intervals (-1, 1s, 10s, 30s) and different merge throttling throughputs (from unlimited down to 10MBps) but the problem still occurs almost every time.

Thanks for your wisdom !

Hervé BRY

Christian_Dahlqvist · October 8, 2015, 2:07pm

That sounds odd. Can you check the index settings? Have by any chance set index.translog.flush_threshold_ops or index.translog.flush_threshold_size to an inappropriate value?

nik9000 · October 8, 2015, 2:10pm

In addition to @Christian_Dahlqvist's ideas, have a look at this:

It might be what you are seeing. Maybe.

You can test this by pushing a single document into the index before you do the bulk index and then waiting 45 seconds or so and then doing the bulk load.

Herve_Bry · October 8, 2015, 2:30pm

Thanks for your suggestions.

@Christian_Dahlqvist: there are no specific translog options set for the index. Here is the config we use :

gist.github.com

https://gist.github.com/setaou/1871dc5a8a7a6f23c633

_settings

{
   "fonds-bibliotheque-20150921": {
      "settings": {
         "index": {
            "refresh_interval": "10s",
            "number_of_shards": "10",
            "creation_date": "1442823809491",
            "analysis": {
               "filter": {
                  "french_stop": {

This file has been truncated. show original

@nik9000: We are going to try your suggestion. It indeed seems like it can be the cause of our problem. Is there any chance this PR might be merged in ES 1.x ?

nik9000 · October 8, 2015, 2:34pm

I doubt it. Its pretty deep in the 2.0 line so it'd be quite difficult.

Topic		Replies	Views
Elasticsearch Segment Size Elasticsearch	18	7757	July 5, 2017
Merge/segment understanding Elasticsearch	3	616	July 6, 2017
Lots of segments per index Elasticsearch	2	374	July 6, 2017
High segment count Elasticsearch	4	3047	August 12, 2019
Replica segments not merged upon bulk indexing Elasticsearch	1	284	July 6, 2017

ES creating thousands of segments with 1 document each

Related topics