On ES 6.2.4+ (v6.2.4,v6.3.0) file descriptors continue to grow as index grows

We are moving from a very old version of ES 1.4.5 to a more modern version. For a majority of the effort we focused on ES 5.5.0 when rewriting the settings and mapping for our 4 indices. About a month ago we decided to move to the latest 6.x release which was v6.2.4. After some minor updates to the script that populates the index from MySQL and code that uses to call index from our application everything looked fine while testing.

Last week we began populating our entire dataset into ES 6.2.4 and for 3 of the 4 indices this went without any issues. However, for the last index we ran into a weird behavior that as we indexed documents we saw a proportional increase in file descriptors which quickly (within 30 min of start) ran through our 65536 file descriptor limit causing the cluster to go red due to too many open files error. If I stopped the index script prior to exhausting the file descriptor limit I saw the open file descriptor count remain unchanged.

Our setup runs each Elasticsearch cluster member (data with shared master) in Docker on a dedicated host launched via terraform and Ansible. For v6.2.4 we leveraged docker.elastic.co/elasticsearch/elasticsearch:6.2.4 along with installing the repository-s3 and discovery-ec2 plugins. As we use terraform I played with sizing the instances both in resources (m4.large->c4.xlarge->m4.2xlarge) with no change in the amount of file descriptors used when progressing to the same point when running the indexing script. Each time destroying and bring up a new cluster when I modified the size. I also played with number of cluster members in the cluster (3->5) and saw a proportional drop as the total number of file descriptors per member got spread across 5 nodes instead of 3.

I came across this: https://github.com/elastic/elasticsearch/pull/29125. This MR is associated to an issue that explains the problem where the file descriptors are associated with tlog files that is caused by an endless flushing loop. I was able to confirm via lsof -p <es process> that the open file descriptors are associated with tlog files but I am not sure how to confirm a endless flushing loop via logs or API call. I did leverage a elasticsearch-py flush argument called wait_if_ongoing which should block a flush if there is an active flush, however, when I ran it against all indices it returned immediately. Since my issue seemed to be at least close to MR 29125 and also seeing on Wednesday (June 13) it was merged into v6.3.0 I decided to give that a try using docker.elastic.co/elasticsearch/elasticsearch-oss:6.3.0. Unfortunately, I saw the same issue present itself, but I did notice that after approx 12 hours the number of open file descriptors recovered to the number before I setup and populated the index right after terraforming a new cluster. I tested this 2 separate times destroying the cluster between and tracked the /_nodes/stats/process API endpoint for open_file_descriptors and max_file_descriptors for each member over the 12 hour period and almost to the minute I saw it drop after 12 hours had lapsed.

Since we do nightly full re-indexes on 40+ million documents across all our indices to accommodate the memory usage for all the file descriptors we would likely need index the full dataset we would have to significantly oversize our cluster members either by size or by count to work around this issue. To try to rule out a config or version issue I deployed Amazon's hosted Elasticsearch v6.2 (which is v6.2.3) and saw the same behavior with open file descriptors. However, when I terraformed ES v5.6.10 I saw a steady 400-410 open file descriptors count across all cluster members where I typically saw the increase on the v6.x cluster.

We would like to get this working on the latest v6.x release but due to the open file descriptor use on this version we are not able to do some and keep without over provisioning. I would appreciate it if someone could provide some suggestions or things to investigate to see if we can get v6.2.4 or 6.3.0 working.

@gnilchee Can you share the shard-level stats? You can get this stats via GET /_stats?level=shards. Thank you!

@nhat I couldn't attach or post output as reply so I posted output to a gist: https://gist.github.com/gnilchee/74de4ee1706cdf40d72e1539c7e5d85f

Thank you for looking!

@gnilchee Thank you for the stats.

The problem with your cluster is that you called _flush with force=true many times even with an empty index.

"commit": {
    "generation": 4133,
    "id": "6Boi3gMCZpv0OUCsIFMoBQ==",
    "num_docs": 0,
    "user_data": {
        "history_uuid": "99yunNI_QIyNOM4PG6fBjQ",
        "local_checkpoint": "-1",
        "max_seq_no": "-1",
        "max_unsafe_auto_id_timestamp": "-1",
        "translog_generation": "4131",
        "translog_uuid": "L8XJxOXoT6S8ShxBAwiCKw"

We keep translog around (up to 12h and 512MB) to increase the chance of the operation-based recovery (introduced in https://github.com/elastic/elasticsearch/pull/25294).

I think you need to remove the manual _flush calls or remove the force parameter.

1 Like

@nhat thanks for the feedback! I will make the adjustments to the script and report back the results.

Thanks again!

1 Like

@nhat I made adjustments and it works as expected! Thank you so much for the clarification on the issue.

If possible, I would appreciate you pointing me in the right direction for documentation on when a flush should be called vs if its built in or not needed for certain operations. I think my understanding of this operation until now has been limited so I chose to just leave the call in as it appeared to be working before.

Closing the loop, based on feedback from @nhat above I noticed a manual flush call that I brought over from our old "working" script was being called within the for loop that called a function that included a bulk index. So in essence it was being called thousands of times while inserting up to 70K documents per bulk operation.


I am glad that you fixed the issue.

You don't have to call _flush manually because Elasticsearch flushes periodically. You might need to call flush (or synced-flush) before shutting down the cluster for maintenance. This makes sure that the cluster will recover quickly.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.