Elasticsearch documents are dropped on bulk insert

I have the following situation: Logstash with a JDBC input connector and an Elasticsearch output connector. The data is loaded initially into the ES, meaning the indices which are filled do not exist prior to the Logstash load. Logstash creates the indices based on mapping templates. Logstash and ES are in version 7.17.0. The ES and Logstash run in Kubernetes.

This scenario worked perfectly fine until recently.

Issue

The indexing works perfectly fine until around 1 mio documents. Then indexing of new documents slows down rapidly and the number of deleted documents oscillates immensely. It increases a lot and drops frequently.

I am aware that documents are deleted as part of an update operation and this was previously already the case. But the deleted documents count mostly increased and did not oscillate to that extend. Before the issue occurred the situation at the end of the load was around 8 mio docs.count and around 3 mio as docs.deleted. At the moment the number stays at around 1.2 mio.

The scenario is deployed on two environments and only one of the environments shows this behavior. Both have the same source and Logstash configuration. The environments differ in their ES instance. Both instances are configured equally.

The disk space is fine and there are no error in the ES log. I also can confirm that the Logstash is running fine and is constantly sending bulk requests to the ES.

Time series of the metrics

TIME 1230:
health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   myindex                   LLC8qxSWTWyO1U25Olljlg   1   1    1182676       166056    642.3mb        642.3mb

TIME 1240:
health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   myindex                   LLC8qxSWTWyO1U25Olljlg   1   1    1182676       533339    946.9mb        946.9mb

TIME 1300:
health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   myindex                   LLC8qxSWTWyO1U25Olljlg   1   1    1182676       349747    701.9mb        701.9mb

TIME 1400:
health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   myindex                   LLC8qxSWTWyO1U25Olljlg   1   1    1182678       467651        1gb            1gb

TIME 1430:
health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   myindex                   LLC8qxSWTWyO1U25Olljlg   1   1    1182678       693906        1gb            1gb

Hey,

can you share any error messages about the drops you mentioned? Or just everything working?

In general deleted documents are cleaned up after some time, so they're nothing to worry about. At some point an merge, that you can consider a process to reduce data size and store data more efficently/compact that as a side effect also removes deleted documents.

You could trigger this yourself doing a flush, but that is not necessary.

Configuring stack monitoring would help you to get some more inside into this, as with the current data everything is hard to debug.

Thanks for the fast reply.
Actually there are no error messages in the ES log. The log mostly contains GC messages and some messages regarding deletion of expired data:

{"type": "server", "timestamp": "2022-04-07T01:38:00,001Z", "level": "INFO", "component": "o.e.x.m.a.TransportDeleteExpiredDataAction", "cluster.name": "docker-cluster", "node.name": "elasticsearch-master-0", "message": "Deleting expired data", "cluster.uuid": "t9wykwleSjar_sVBoBzF-A", "node.id": "X3QXJKtsTG6L5M0Ym8hnTg"  }
{"type": "server", "timestamp": "2022-04-07T01:38:00,004Z", "level": "INFO", "component": "o.e.x.m.j.r.UnusedStatsRemover", "cluster.name": "docker-cluster", "node.name": "elasticsearch-master-0", "message": "Successfully deleted [0] unused stats documents", "cluster.uuid": "t9wykwleSjar_sVBoBzF-A", "node.id": "X3QXJKtsTG6L5M0Ym8hnTg"  }
{"type": "server", "timestamp": "2022-04-07T01:38:00,005Z", "level": "INFO", "component": "o.e.x.m.a.TransportDeleteExpiredDataAction", "cluster.name": "docker-cluster", "node.name": "elasticsearch-master-0", "message": "Completed deletion of expired ML data", "cluster.uuid": "t9wykwleSjar_sVBoBzF-A", "node.id": "X3QXJKtsTG6L5M0Ym8hnTg"  }
{"type": "server", "timestamp": "2022-04-07T01:38:00,005Z", "level": "INFO", "component": "o.e.x.m.MlDailyMaintenanceService", "cluster.name": "docker-cluster", "node.name": "elasticsearch-master-0", "message": "Successfully completed [ML] maintenance task: triggerDeleteExpiredDataTask", "cluster.uuid": "t9wykwleSjar_sVBoBzF-A", "node.id": "X3QXJKtsTG6L5M0Ym8hnTg"  }
{"type": "server", "timestamp": "2022-04-07T02:03:50,357Z", "level": "INFO", "component": "o.e.c.m.MetadataMappingService", "cluster.name": "docker-cluster", "node.name": "elasticsearch-master-0", "message": "[myindex/wgLY7ko8QjagcOF7l2MIkA] update_mapping [_doc]", "cluster.uuid": "t9wykwleSjar_sVBoBzF-A", "node.id": "X3QXJKtsTG6L5M0Ym8hnTg"  }

The logstash is sends bulk inserts frequently. At the moment it already send around 15 mio successful output events:

                    {
                        "id": "output_myindex",
                        "events": {
                            "out": 15970174,
                            "duration_in_millis": 27895129,
                            "in": 15970279
                        },
                        "documents": {
                            "successes": 15970174
                        },
                        "name": "elasticsearch",
                        "bulk_requests": {
                            "successes": 160026,
                            "responses": {
                                "200": 160026
                            }
                        }
                    },

What metrics can I provide you to give you more insights?

Thanks a lot again for the reply.

Actually after analyzing the Logstash in detail we realized that Logstash is not updating it's meta data for some reason and hence it sends the same data in an endless loop.
The ES works perfectly fine and this also explains the oscillating deleted documents.

Seems to be a network issue which we need to investigate further.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.