PostgreSQL: infinite DECLARE CURSOR when rebuilding one Elasticsearch index

Drept · September 12, 2021, 10:51am

I have a number of Elasticsearch indices for millions of objects. The problem arises only with one of them that contains around 5 million objects.

When I am trying to rebuild the Elasticsearch index (I am using django-elasticsearch-dsl), the following occurs:

The document count in the index remains 0.
The disk usage in my PostgreSQL database is growing rapidly. Namely, the Temp folder is growing.
If I do nothing, the rebuilding stops when 100% of the disk is used.
If I kill the index rebuild command manually, the disk usage still continues growing until I restart PostgreSQL .
ps aux shows that the DECLARE CURSOR statement is always there (i.e. is not completed) up until the crash.
In the long run, the Elasticsearch index is there but with 0 documents.

This index had existed before and I didn't have any problems with it. I do not think I have changed any of its settings, so the behaviour is even more strange. I have no problems with disk space usage and endure no problems when rebuilding an index that is 20 times larger.

The settings of the problem index are the following:

number_of_shards: 1
number_of_replicas: 0
"mappings": {"properties":{"keyword":{"type":"text"}}}

So, I have only one field called 'keyword', I'm using the standard analyzer and do no extra work over the 'keyword' content when building the index.

I have tried renaming the index; blocking all signals from Elasticsearch to PosgreSQL; disabling auto_refresh on Elasticsearch; deleting, building, and filling instead of just rebuilding; rebooting the website as a whole; etc.

What might be causing this behaviour and how can I deal with it?

warkolm · September 13, 2021, 6:02am

I'm not familiar with the django client, but what do your Elasticsearch logs show?
Have you enabled logging from your code to see what it's doing? What does your code look like?

Drept · September 13, 2021, 2:27pm

My logs show nothing at all. I have slow logs enabled for indexing, but they are empty. The general elasticsearch log shows nothing specific. There are no helath issues either.

My code is just as described above. The mapping consists of only one field, the data for this field is indexed as is, without any preparation.

Here is the python document code for django:

@registry.register_document
class DocKey(Document):
    class Index:
        name = 'doc_keys'
        settings = {'number_of_shards': 1,
                    'number_of_replicas': 0}

    class Django:
        model = DocKey
        fields = [
            'keyword',
        ]

        ignore_signals = True
        auto_refresh = False

Again, all had been working but somehow broke down, and only for one specific index.

system · October 11, 2021, 2:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.