Loosing logs with logstash-trasnfer protocol

I recently realized that our replication between our clusters is broken. we loose logs every minute (not a lot but we are loosing data). The way we replicate from master to slave is with Logstash-transfer. I tried to implement CCR as a solution but I ended up with more problems as there is no users creation in the cloud and the whole thing became a rabbit hole. So I am trying to just fix the logs that we loose. I only see this on the transfer pods:

[2021-07-22T18:59:05,327][WARN ][logstash.monitoringextension.pipelineregisterhook] xpack.monitoring.enabled has not been defined, but found elasticsearch configuration. Please explicitly set `xpack.monitoring.enabled: true` in logstash.yml
[2021-07-22T18:59:05,331][WARN ][deprecation.logstash.monitoringextension.pipelineregisterhook] Internal collectors option for Logstash monitoring is deprecated and targeted for removal in the next major version.
[2021-07-22T18:59:06,038][WARN ][deprecation.logstash.outputs.elasticsearch] Relying on default value of `pipeline.ecs_compatibility`, which may change in a future major release of Logstash. To avoid unexpected changes when upgrading Logstash, please explicitly declare your desired ECS Compatibility mode.
[2021-07-22T18:59:08,633][WARN ][logstash.licensechecker.licensereader] Restored connection to ES instance {:url=>"http://elasticsearch:9200/"}
[2021-07-22T18:59:08,925][WARN ][logstash.licensechecker.licensereader] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7}
[2021-07-22T18:59:14,635][WARN ][deprecation.logstash.outputs.elasticsearchmonitoring] Relying on default value of `pipeline.ecs_compatibility`, which may change in a future major release of Logstash. To avoid unexpected changes when upgrading Logstash, please explicitly declare your desired ECS Compatibility mode.
[2021-07-22T18:59:15,026][WARN ][logstash.outputs.elasticsearchmonitoring] Restored connection to ES instance {:url=>"http://elasticsearch:9200/"}
[2021-07-22T18:59:15,033][WARN ][logstash.outputs.elasticsearchmonitoring] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7}
[2021-07-22T18:59:15,145][WARN ][logstash.outputs.elasticsearchmonitoring] Configuration is data stream compliant but due backwards compatibility Logstash 7.x will not assume writing to a data-stream, default behavior will change on Logstash 8.0 (set `data_stream => true/false` to disable this warning)
[2021-07-22T18:59:15,146][WARN ][logstash.javapipeline    ] 'pipeline.ordered' is enabled and is likely less efficient, consider disabling if preserving event order is not necessary
[2021-07-22T18:59:15,146][WARN ][logstash.outputs.elasticsearchmonitoring] Configuration is data stream compliant but due backwards compatibility Logstash 7.x will not assume writing to a data-stream, default behavior will change on Logstash 8.0 (set `data_stream => true/false` to disable this warning)
[2021-07-22T18:59:20,241][WARN ][logstash.javapipeline    ] 'pipeline.ordered' is enabled and is likely less efficient, consider disabling if preserving event order is not necessary

and here is the nginx on the receiving side:

DEV_CL"
2021/07/22 00:06:39 [warn] 1763#1763: *84613 an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/5/70/0000000705 while reading upstream, client: 172.20.5.1, server: logs.XXXX.com, request: "GET /40943/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js HTTP/1.1", upstream: "http://172.19.254.87:5601/40943/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js", host: "ntxmon-caw-kibana.XXXX.com", referrer: "https://ntxmon-caw-kibana.XXXX.com/app/management/data/remote_clusters/add?redirect=%2Fdata%2Fcross_cluster_replication%2Ffollower_indices%2Fadd"

elastic on the receiving side:

[2021-07-21T23:43:10,757][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [15811ms] which is above the warn threshold of [5s]
[2021-07-21T23:45:24,047][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [13212ms] which is above the warn threshold of [5s]
[2021-07-21T23:59:09,271][WARN ][o.e.g.PersistedClusterStateService] [es-data-0] writing cluster state took [17211ms] which is above the warn threshold of [10s]; wrote global metadata [true] and metadata for [0] indices and skipped [506] unchanged indices
[2021-07-22T00:00:39,672][WARN ][o.e.g.PersistedClusterStateService] [es-data-0] writing cluster state took [17011ms] which is above the warn threshold of [10s]; wrote global metadata [false] and metadata for [1] indices and skipped [508] unchanged indices
[2021-07-22T00:13:08,541][WARN ][o.e.g.PersistedClusterStateService] [es-data-0] writing cluster state took [11807ms] which is above the warn threshold of [10s]; wrote global metadata [true] and metadata for [0] indices and skipped [509] unchanged indices
[2021-07-22T00:13:42,268][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [15810ms] which is above the warn threshold of [5s]
[2021-07-22T00:19:49,143][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [6239ms] which is above the warn threshold of [5s]
[2021-07-22T00:26:08,380][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [19270ms] which is above the warn threshold of [5s]
[2021-07-22T00:28:21,514][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [13008ms] which is above the warn threshold of [5s]
[2021-07-22T00:34:27,027][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [5637ms] which is above the warn threshold of [5s]
[2021-07-22T00:46:39,604][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [12423ms] which is above the warn threshold of [5s]
[2021-07-22T00:50:47,488][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [8005ms] which is above the warn threshold of [5s]
[2021-07-22T00:53:03,637][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [16110ms] which is above the warn threshold of [5s]
[2021-07-22T00:59:17,212][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [13333ms] which is above the warn threshold of [5s]
[2021-07-22T01:05:22,777][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [5016ms] which is above the warn threshold of [5s]
[2021-07-22T01:31:49,266][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [18812ms] which is above the warn threshold of [5s]
[2021-07-22T03:00:30,651][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [29254ms] which is above the warn threshold of [5s]
[2021-07-22T04:01:30,381][WARN ][o.e.h.AbstractHttpServerTransport] [es-data-0] handling request [null][GET][/_xpack][Netty4HttpChannel{localAddress=/172.20.4.65:9200, remoteAddress=/172.20.5.74:34422}] took [5311ms] which is above the warn threshold of [5000ms]
[2021-07-23T00:57:54,005][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [17410ms] which is above the warn threshold of [5s]
[2021-07-23T01:10:14,444][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [19011ms] which is above the warn threshold of [5s]
[2021-07-23T01:16:21,178][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [6204ms] which is above the warn threshold of [5s]
[2021-07-23T01:38:41,461][WARN ][o.e.m.f.FsHealthService  ] [es-data-0] health check of [/data/data/nodes/0] took [9948ms] which is above the warn threshold of [5s]

Not sure if I am looking at the right place. Any help will be highly appreciated

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.