Logstash TCP input pipeline performance issues

I’m having an issue when a particular pipeline and i’m not sure how to track it down or trouble shoot it further..

First, I have multiple cloud environments, configured the same, sending to the same endpoints. 2 of the three are working with no issues. When I look at the difference between in and out of the working pipelines, it’s around a difference of 200.

The environment I'm having problems with will have a diffrence of 100,000 between in and out in the pipeline.

The the logstash server is an Instance is in AWS it’s an M5.xlarge (4 vCPU 16GB RAM). I’ve resized up to m5.8xlarge and still had the same issue. Resources are setting pretty idle.

I’m using logstash to collect events locally and then ship them off to Splunk hec token.

I have 4 pipelines, a UDP, TCP, and two s3 pipelines.

I can connect to the spunk endpoint using curl.

Here is some details of my setup.

input {
  tcp {
    port => 1514
}
  tcp {
    port => 10514
}
}
filter {
  mutate {
    replace => { “message” => “%{message}: logstash_time:%{@timestamp}”
}
}
}
output {
  http {
    http_method => “post”
    url => “https://<endpoint_url>/services/collector/event/1.0”
#    url => “https://<endpoint_url>/services/collector/event?/index=nonprod” # non-prod index
#    headers => [“Authorization”, “Splunk Endpoint_HEC_Token” ] # non-prod HEC
    headers => [“Authorization”, “Splunk Endpoint_HEC_Toke” ] # ENV1 Hec
#    headers => [“Authorization”, “Splunk Endpoint_HEC_Toke” ] # ENV2 Hec
    mapping => {
      event => “%{message}”
    }
  }
#  stdout {}
}

jvm.options-server

-Xms8g
-Xmx8g
-XX:+UseShenandoahGC
-XX:+AlwaysPreTouch
-XX:+UseNUMA
-XX:-UseBiasedLocking
-Duser.language=en
-Duser.country=US
#-Djava.io.tmpdir=/opt/logstash/tmp
-Dfile.encoding=UTF-8
-Djruby.compile.invokedynamic=true
-Djruby.jit.threshold=0
-Djruby.regexp.interruptible=true
-XX:+HeapDumpOnOutOfMemoryError
-Djava.security.egd=file:/dev/urandom
-Dlog4j2.isThreadContextMapInheritable=true
11-:--add-opens=java.base/java.security=ALL-UNNAMED
11-:--add-opens=java.base/java.io=ALL-UNNAMED
11-:--add-opens=java.base/java.nio.channels=ALL-UNNAMED
11-:--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
11-:--add-opens=java.management/sun.management=ALL-UNNAMED
-Dnetworkaddress.cache.ttl=60

logstash.yml

path.data: /var/lib/logstash
path.logs: /var/log/logstash
pipeline.ordered: false
pipeline.workers: 30
pipeline.batch.size: 1536
#pipeline.workers: 18
#pipeline.batch.size: 1500
log.level: info
pipeline.ecs_compatibility: disabled
dead_letter_queue.enable: true

pipelines.yml

- pipeline.id: env_tcp
  path.config: “/etc/logstash/conf.d/ENV_tcp.conf”
- pipeline.id: env_udp
  path.config: “/etc/logstash/conf.d/ENV_udp.conf”
- pipeline.id: cloudwatch_metrics_east
  path.config: “/etc/logstash/conf.d/cloudwatch_metrics_east.conf”
- pipeline.id: cloudwatch_metrics_west
  path.config: “/etc/logstash/conf.d/cloudwatch_metrics_west.conf”

stats

{
  "host": "ip-x-x-x-x.us-west-2.compute.internal",
  "version": "8.6.0",
  "http_address": "127.0.0.1:9600",
  "id": "f0ab31c1-e9d6-4a34-b86e-7c753256de9c",
  "name": "ip-x-x-x-x.us-west-2.compute.internal",
  "ephemeral_id": "4b0374bd-5560-4aac-8d74-88ec54f5c8b2",
  "status": "green",
  "snapshot": false,
  "pipeline": {
    "workers": 30,
    "batch_size": 1536,
    "batch_delay": 50
  },
  "jvm": {
    "threads": {
      "count": 303,
      "peak_count": 304
    },
    "mem": {
      "heap_used_percent": 71,
      "heap_committed_in_bytes": 8589934592,
      "heap_max_in_bytes": 8589934592,
      "heap_used_in_bytes": 6128615320,
      "non_heap_used_in_bytes": 234153560,
      "non_heap_committed_in_bytes": 253165568,
      "pools": {
        "old": {
          "max_in_bytes": 0,
          "peak_max_in_bytes": 0,
          "used_in_bytes": 0,
          "peak_used_in_bytes": 0,
          "committed_in_bytes": 0
        },
        "survivor": {
          "max_in_bytes": 0,
          "peak_max_in_bytes": 0,
          "used_in_bytes": 0,
          "peak_used_in_bytes": 0,
          "committed_in_bytes": 0
        },
        "young": {
          "max_in_bytes": 0,
          "peak_max_in_bytes": 0,
          "used_in_bytes": 0,
          "peak_used_in_bytes": 0,
          "committed_in_bytes": 0
        }
      }
    },
    "gc": {
      "collectors": {
        "old": {
          "collection_count": 168,
          "collection_time_in_millis": 16
        },
        "young": {
          "collection_count": 42,
          "collection_time_in_millis": 47492
        }
      }
    },
    "uptime_in_millis": 5208612
  },
  "process": {
    "open_file_descriptors": 354,
    "peak_open_file_descriptors": 360,
    "max_file_descriptors": 16384,
    "mem": {
      "total_virtual_in_bytes": 13429489664
    },
    "cpu": {
      "total_in_millis": 1232440,
      "percent": 3,
      "load_average": {
        "1m": 0.34,
        "5m": 0.28,
        "15m": 0.27
      }
    }
  },
  "events": {
    "in": 1983728,
    "filtered": 1891560,
    "out": 1891560,
    "duration_in_millis": 180627825,
    "queue_push_duration_in_millis": 40873953
  },
  "flow": {
    "input_throughput": {
      "current": 8.198,
      "last_1_minute": 75.84,
      "last_5_minutes": 338.8,
      "last_15_minutes": 382.9,
      "last_1_hour": 368.4,
      "lifetime": 381.8
    },
    "filter_throughput": {
      "current": 8.196,
      "last_1_minute": 80.85,
      "last_5_minutes": 338.8,
      "last_15_minutes": 382.9,
      "last_1_hour": 368.4,
      "lifetime": 364.1
    },
    "output_throughput": {
      "current": 8.195,
      "last_1_minute": 80.85,
      "last_5_minutes": 338.8,
      "last_15_minutes": 382.9,
      "last_1_hour": 368.4,
      "lifetime": 364.1
    },
    "queue_backpressure": {
      "current": 0,
      "last_1_minute": 4.631,
      "last_5_minutes": 7.094,
      "last_15_minutes": 8.021,
      "last_1_hour": 7.916,
      "lifetime": 7.867
    },
    "worker_concurrency": {
      "current": 0.6545,
      "last_1_minute": 8.732,
      "last_5_minutes": 30.18,
      "last_15_minutes": 35.6,
      "last_1_hour": 35.28,
      "lifetime": 34.77
    }
  },
  "pipelines": {
    "cloudwatch_metrics_west": {
      "events": {
        "queue_push_duration_in_millis": 5,
        "duration_in_millis": 6160547,
        "filtered": 78089,
        "out": 78089,
        "in": 78089
      },
      "flow": {
        "output_throughput": {
          "current": 0,
          "last_1_minute": 0,
          "last_5_minutes": 13.43,
          "last_15_minutes": 14.52,
          "last_1_hour": 14.3,
          "lifetime": 15.04
        },
        "queue_backpressure": {
          "current": 0,
          "last_1_minute": 0,
          "last_5_minutes": 0,
          "last_15_minutes": 0,
          "last_1_hour": 0,
          "lifetime": 9.629e-07
        },
        "filter_throughput": {
          "current": 0,
          "last_1_minute": 0,
          "last_5_minutes": 13.43,
          "last_15_minutes": 14.52,
          "last_1_hour": 14.3,
          "lifetime": 15.04
        },
        "worker_concurrency": {
          "current": 0,
          "last_1_minute": 0,
          "last_5_minutes": 1.07,
          "last_15_minutes": 1.165,
          "last_1_hour": 1.143,
          "lifetime": 1.186
        },
        "input_throughput": {
          "current": 0,
          "last_1_minute": 0,
          "last_5_minutes": 13.43,
          "last_15_minutes": 14.52,
          "last_1_hour": 14.3,
          "lifetime": 15.04
        }
      },
      "plugins": {
        "inputs": [
          {
            "id": "b579ed5749fc49dfc6e207503a393f5bc17ce412d508fcb6f5acbe7bc570a2a5",
            "name": "s3",
            "events": {
              "queue_push_duration_in_millis": 5,
              "out": 78089
            }
          }
        ],
        "codecs": [
          {
            "id": "plain_d9d7a569-afc6-4415-ac32-c0024e2c3c99",
            "decode": {
              "duration_in_millis": 0,
              "out": 0,
              "writes_in": 0
            },
            "encode": {
              "duration_in_millis": 0,
              "writes_in": 0
            },
            "name": "plain"
          },
          {
            "id": "plain_9e472360-c4ae-4acd-b7c5-3e986aab9c26",
            "decode": {
              "duration_in_millis": 163,
              "out": 78089,
              "writes_in": 78089
            },
            "encode": {
              "duration_in_millis": 0,
              "writes_in": 0
            },
            "name": "plain"
          }
        ],
        "filters": [],
        "outputs": [
          {
            "id": "9c1864878802c0b65f91a12d229dcd7b66153e748b2aeac18b2addcafb7d9568",
            "name": "http",
            "events": {
              "duration_in_millis": 6160517,
              "out": 78089,
              "in": 78089
            }
          }
        ]
      },
      "reloads": {
        "successes": 0,
        "last_success_timestamp": null,
        "failures": 0,
        "last_error": null,
        "last_failure_timestamp": null
      },
      "queue": {
        "type": "memory",
        "events_count": 0,
        "queue_size_in_bytes": 0,
        "max_queue_size_in_bytes": 0
      },
      "dead_letter_queue": {
        "max_queue_size_in_bytes": 1073741824,
        "dropped_events": 0,
        "expired_events": 0,
        "queue_size_in_bytes": 1,
        "last_error": "no errors",
        "storage_policy": "drop_newer"
      },
      "hash": "411bb89f24db2895feacaaaa16e8acbe8da077e026d02735ca7931196aaa199a",
      "ephemeral_id": "33591929-5123-46ab-b80a-c59146b9a08b"
    },
    "env_tcp": {
      "events": {
        "queue_push_duration_in_millis": 40873948,
        "duration_in_millis": 151879287,
        "filtered": 1713729,
        "out": 1713729,
        "in": 1805897
      },
      "flow": {
        "output_throughput": {
          "current": 0,
          "last_1_minute": 71.9,
          "last_5_minutes": 313,
          "last_15_minutes": 351.1,
          "last_1_hour": 336.7,
          "lifetime": 330.1
        },
        "queue_backpressure": {
          "current": 0,
          "last_1_minute": 4.619,
          "last_5_minutes": 7.586,
          "last_15_minutes": 7.685,
          "last_1_hour": 7.923,
          "lifetime": 7.872
        },
        "filter_throughput": {
          "current": 0,
          "last_1_minute": 71.9,
          "last_5_minutes": 313,
          "last_15_minutes": 351.1,
          "last_1_hour": 336.7,
          "lifetime": 330.1
        },
        "worker_concurrency": {
          "current": 0,
          "last_1_minute": 6.028,
          "last_5_minutes": 26.2,
          "last_15_minutes": 30.2,
          "last_1_hour": 29.93,
          "lifetime": 29.25
        },
        "input_throughput": {
          "current": 0,
          "last_1_minute": 71.9,
          "last_5_minutes": 313,
          "last_15_minutes": 351.1,
          "last_1_hour": 336.7,
          "lifetime": 347.8
        }
      },
      "plugins": {
        "inputs": [
          {
            "id": "66bd1e72ef73bb64aa052bbd54f40a037db491b0e21f05d4e399423c83741a9e",
            "name": "tcp",
            "events": {
              "queue_push_duration_in_millis": 40873948,
              "out": 1805897
            }
          },
          {
            "id": "6e5d5be1ff796ccadc6a00f60cdb226b14c764ddd9985cd4d6d7be2a9ec638ae",
            "name": "tcp",
            "events": {
              "queue_push_duration_in_millis": 0,
              "out": 0
            }
          }
        ],
        "codecs": [
          {
            "id": "line_b600f2af-c170-4835-a445-2cd7fbdaf69a",
            "decode": {
              "duration_in_millis": 0,
              "out": 0,
              "writes_in": 0
            },
            "encode": {
              "duration_in_millis": 0,
              "writes_in": 0
            },
            "name": "line"
          },
          {
            "id": "rubydebug_737b68bf-687d-4eff-967a-4e408ea29d3c",
            "decode": {
              "duration_in_millis": 0,
              "out": 0,
              "writes_in": 0
            },
            "encode": {
              "duration_in_millis": 81664,
              "writes_in": 1713729
            },
            "name": "rubydebug"
          },
          {
            "id": "plain_6d5fc048-4cd2-424b-ac18-3007c1493842",
            "decode": {
              "duration_in_millis": 0,
              "out": 0,
              "writes_in": 0
            },
            "encode": {
              "duration_in_millis": 0,
              "writes_in": 0
            },
            "name": "plain"
          },
          {
            "id": "line_7454bf55-ad61-45a0-8f60-20cdfc269df8",
            "decode": {
              "duration_in_millis": 40860473,
              "out": 1805897,
              "writes_in": 7877
            },
            "encode": {
              "duration_in_millis": 0,
              "writes_in": 0
            },
            "name": "line"
          }
        ],
        "filters": [
          {
            "id": "3e38df1ffaeb932dc8c3395fd4cc3e5c87e5a22bd47bf8b83a0399061d524f48",
            "name": "mutate",
            "events": {
              "duration_in_millis": 9880,
              "out": 1759809,
              "in": 1759809
            }
          }
        ],
        "outputs": [
          {
            "id": "07ac3e45a22a023f63380888267fe915a591a2643b9baa4c77023ac113f2e6c2",
            "name": "stdout",
            "events": {
              "duration_in_millis": 89051,
              "out": 1713729,
              "in": 1713729
            }
          },
          {
            "id": "c685636696572b6d74ea68ff726974823e22f806987cff4044310f1b953496a5",
            "name": "http",
            "events": {
              "duration_in_millis": 151778979,
              "out": 1713729,
              "in": 1759809
            }
          }
        ]
      },
      "reloads": {
        "successes": 0,
        "last_success_timestamp": null,
        "failures": 0,
        "last_error": null,
        "last_failure_timestamp": null
      },
      "queue": {
        "type": "memory",
        "events_count": 0,
        "queue_size_in_bytes": 0,
        "max_queue_size_in_bytes": 0
      },
      "dead_letter_queue": {
        "max_queue_size_in_bytes": 1073741824,
        "dropped_events": 0,
        "expired_events": 0,
        "queue_size_in_bytes": 1,
        "last_error": "no errors",
        "storage_policy": "drop_newer"
      },
      "hash": "8376793ee8e97740a180b26e9f0bf072671d62960802f36c569b36d3bc819ecb",
      "ephemeral_id": "72128f94-be9e-454c-a599-11daf3e87430"
    },
    "cloudwatch_metrics_east": {
      "events": {
        "queue_push_duration_in_millis": 0,
        "duration_in_millis": 5989305,
        "filtered": 69516,
        "out": 69516,
        "in": 69516
      },
      "flow": {
        "output_throughput": {
          "current": 0,
          "last_1_minute": 4.978,
          "last_5_minutes": 12.73,
          "last_15_minutes": 13.27,
          "last_1_hour": 12.27,
          "lifetime": 13.39
        },
        "queue_backpressure": {
          "current": 0,
          "last_1_minute": 0,
          "last_5_minutes": 0,
          "last_15_minutes": 0,
          "last_1_hour": 0,
          "lifetime": 0
        },
        "filter_throughput": {
          "current": 0,
          "last_1_minute": 4.977,
          "last_5_minutes": 12.73,
          "last_15_minutes": 13.27,
          "last_1_hour": 12.27,
          "lifetime": 13.39
        },
        "worker_concurrency": {
          "current": 0,
          "last_1_minute": 0.4985,
          "last_5_minutes": 1.321,
          "last_15_minutes": 1.375,
          "last_1_hour": 1.111,
          "lifetime": 1.153
        },
        "input_throughput": {
          "current": 0,
          "last_1_minute": 0,
          "last_5_minutes": 12.73,
          "last_15_minutes": 13.27,
          "last_1_hour": 12.27,
          "lifetime": 13.39
        }
      },
      "plugins": {
        "inputs": [
          {
            "id": "03e66db4bb0663a3e306a461d634a9a7eb915892c297d14d03da5d86243377c8",
            "name": "s3",
            "events": {
              "queue_push_duration_in_millis": 0,
              "out": 69516
            }
          }
        ],
        "codecs": [
          {
            "id": "plain_5af2204b-409e-4f30-a3dd-d8b43ffcb030",
            "decode": {
              "duration_in_millis": 0,
              "out": 0,
              "writes_in": 0
            },
            "encode": {
              "duration_in_millis": 0,
              "writes_in": 0
            },
            "name": "plain"
          },
          {
            "id": "plain_abf35f16-af17-44a1-a85c-09fdadda462e",
            "decode": {
              "duration_in_millis": 34,
              "out": 69516,
              "writes_in": 69516
            },
            "encode": {
              "duration_in_millis": 0,
              "writes_in": 0
            },
            "name": "plain"
          }
        ],
        "filters": [],
        "outputs": [
          {
            "id": "a40ed02461c43dd78d88afa3cc6cce5efe4d4b8fa60cbb0887f3a423dfa53556",
            "name": "http",
            "events": {
              "duration_in_millis": 5989283,
              "out": 69516,
              "in": 69516
            }
          }
        ]
      },
      "reloads": {
        "successes": 0,
        "last_success_timestamp": null,
        "failures": 0,
        "last_error": null,
        "last_failure_timestamp": null
      },
      "queue": {
        "type": "memory",
        "events_count": 0,
        "queue_size_in_bytes": 0,
        "max_queue_size_in_bytes": 0
      },
      "dead_letter_queue": {
        "max_queue_size_in_bytes": 1073741824,
        "dropped_events": 0,
        "expired_events": 0,
        "queue_size_in_bytes": 1,
        "last_error": "no errors",
        "storage_policy": "drop_newer"
      },
      "hash": "0c4603d8c87626ba6f5b96023fd9266185a940eb39e83bde9abc9af83fd67513",
      "ephemeral_id": "34ec5474-c2b6-40b2-bb88-0dab3fb8afce"
    },
    "env_udp": {
      "events": {
        "queue_push_duration_in_millis": 0,
        "duration_in_millis": 16598686,
        "filtered": 30226,
        "out": 30226,
        "in": 30226
      },
      "flow": {
        "output_throughput": {
          "current": 8.096,
          "last_1_minute": 3.76,
          "last_5_minutes": 3.93,
          "last_15_minutes": 4.255,
          "last_1_hour": 5.2,
          "lifetime": 5.821
        },
        "queue_backpressure": {
          "current": 0,
          "last_1_minute": 0,
          "last_5_minutes": 0,
          "last_15_minutes": 0,
          "last_1_hour": 0,
          "lifetime": 0
        },
        "filter_throughput": {
          "current": 8.095,
          "last_1_minute": 3.76,
          "last_5_minutes": 3.93,
          "last_15_minutes": 4.255,
          "last_1_hour": 5.2,
          "lifetime": 5.821
        },
        "worker_concurrency": {
          "current": 0.6465,
          "last_1_minute": 2.182,
          "last_5_minutes": 1.927,
          "last_15_minutes": 2.852,
          "last_1_hour": 3.102,
          "lifetime": 3.197
        },
        "input_throughput": {
          "current": 8.095,
          "last_1_minute": 3.729,
          "last_5_minutes": 3.93,
          "last_15_minutes": 4.247,
          "last_1_hour": 5.199,
          "lifetime": 5.821
        }
      },
      "plugins": {
        "inputs": [
          {
            "id": "beb53ec7fe5a8c7bf89c2b5ceb337de70a10e20369975aa18b1b810d9f327c20",
            "workers": 2,
            "queue_size": 2000,
            "name": "udp",
            "events": {
              "queue_push_duration_in_millis": 0,
              "out": 30226
            }
          },
          {
            "id": "c13e0a2f3bf23a321bbdd780312f65318d414d37ae761d9c0451c08565e7bfcf",
            "workers": 2,
            "queue_size": 2000,
            "name": "udp",
            "events": {
              "queue_push_duration_in_millis": 0,
              "out": 0
            }
          }
        ],
        "codecs": [
          {
            "id": "plain_e8105133-74ea-464c-b495-d48534324aed",
            "decode": {
              "duration_in_millis": 0,
              "out": 0,
              "writes_in": 0
            },
            "encode": {
              "duration_in_millis": 0,
              "writes_in": 0
            },
            "name": "plain"
          },
          {
            "id": "plain_0d4eb30a-7424-4a00-8381-f7d6afb5e062",
            "decode": {
              "duration_in_millis": 0,
              "out": 0,
              "writes_in": 0
            },
            "encode": {
              "duration_in_millis": 0,
              "writes_in": 0
            },
            "name": "plain"
          },
          {
            "id": "plain_e62980cb-764f-4c31-8e32-b38ee2d34ecc",
            "decode": {
              "duration_in_millis": 290,
              "out": 30226,
              "writes_in": 30226
            },
            "encode": {
              "duration_in_millis": 0,
              "writes_in": 0
            },
            "name": "plain"
          }
        ],
        "filters": [
          {
            "id": "abcba2023ab028329b8f63a8fa3267aac3889d818265b8c7866988198ba7dd65",
            "name": "mutate",
            "events": {
              "duration_in_millis": 524,
              "out": 30226,
              "in": 30226
            }
          }
        ],
        "outputs": [
          {
            "id": "dd69baa2f096042cc8227d2296452a225bfe7f066dd96434d86c33118e39333c",
            "name": "http",
            "events": {
              "duration_in_millis": 16596889,
              "out": 30226,
              "in": 30226
            }
          }
        ]
      },
      "reloads": {
        "successes": 0,
        "last_success_timestamp": null,
        "failures": 0,
        "last_error": null,
        "last_failure_timestamp": null
      },
      "queue": {
        "type": "memory",
        "events_count": 0,
        "queue_size_in_bytes": 0,
        "max_queue_size_in_bytes": 0
      },
      "dead_letter_queue": {
        "max_queue_size_in_bytes": 1073741824,
        "dropped_events": 0,
        "expired_events": 0,
        "queue_size_in_bytes": 1,
        "last_error": "no errors",
        "storage_policy": "drop_newer"
      },
      "hash": "f749759a50d50a995635158e0582f218c2d4bed977ebf5eff6f2fdcf6fe3034f",
      "ephemeral_id": "cacb7368-868d-4261-a6e5-642b2d4080ca"
    }
  },
  "reloads": {
    "successes": 0,
    "failures": 0
  },
  "os": {
    "cgroup": {
      "cpuacct": {
        "control_group": "/",
        "usage_nanos": 2010261019664
      },
      "cpu": {
        "cfs_quota_micros": -1,
        "cfs_period_micros": 100000,
        "control_group": "/",
        "stat": {
          "number_of_elapsed_periods": 0,
          "number_of_times_throttled": 0,
          "time_throttled_nanos": 0
        }
      }
    }
  },
  "queue": {
    "events_count": 0
  }
}

I'm not sure what to look at next.

What is the issue? It is not clear what is your issue. Can you provide more context?

You said you have 4 pipelines, which one is giving you issues?

Also, do you have anything in Logstash logs that would indicate that the issue is in Logstash and not in the receiving side? Most of the time performance issues are on the output, not on Logstash.

Did you check in Splunk side if you have any log that would indicate an issue?

Well, the issue is events are making into splunk. The amount of events making it into splunk is much less than expected. Stats show tis in the tcp pipeline.

    "env_tcp": {
      "events": {
        "queue_push_duration_in_millis": 40873948,
        "duration_in_millis": 151879287,
        "filtered": 1713729,
        "out": 1713729,
        "in": 1805897
      },

It seems like once there is 100k difference between in and out, its not accepting any thing new. I think its back pressure. Not sure how next to troubleshoot it next.

You may add a file output to check if the number of the events received is the same as the events written to the file.

The configuration you shared is pretty simple, there is nothing in the Logstash side that would indicate an issue with Logstash, it seems that your issue is on the Splunk side.

If it cannot deal with the event rate sent by Logstash and applies some kind of back pressure you will lose some events, neither the TCP input nor the UDP input supports any kind of back pressure, the events may be dropped.

I would also reduce the size of the pipeline.batch.size to see if it helps, maybe the size of the Logstash is request is getting bigger than the max size that your output supports.

You can also try to add persistent queues in the logstash side to see if it helps.

So, the logstash inputs don't have any kind of back pressure and it handles not keeping sending out by discarding packets? If so that might explain some other issues.

I might try usimg a different input, like syslog.

It depends on the input, in the persitent queue documentation you have a little explanation.

Input plugins that do not use a request-response protocol cannot be protected from data loss. Tcp, udp, zeromq push+pull, and many other inputs do not have a mechanism to acknowledge receipt to the sender.

It will make no difference, the syslog input is basically a tcp input and udp input listening on the same port and using a preconfigured grok filter.

But again, your issue does not seem to be on Logstash, you need to check on the Splunk side if you can tune something.

That is definitely true for UDP (and that is sometimes useful if dropping events is preferable to a logstash pipeline stalling) but it is not true for TCP. If the pipeline queues fill then the tcp input cannot flush, so the host TCP stack will fill its buffers and then close the transmission window, preventing any more data being sent over the network from the source. It is not impossible that the program sending data over TCP would then drop data, but logstash will not.

@mread380 I am impressed that you are using Shenandoah! (Speaking as someone who used to do Java GC tuning back in the days when that was a thing you could do for a living -- yes, ConcurrentMarkSweep, I am looking at you!)

The problem is in the http output of the "env_tcp" pipeline. If you go through your stats and look at the "events" entries the only one that has an imbalance is

  "events": {
    "in": 1983728,
    "filtered": 1891560,
    "out": 1891560,
....
  "pipelines": {
....
"env_tcp": {
  "events": {
    "queue_push_duration_in_millis": 40873948,
    "duration_in_millis": 151879287,
    "filtered": 1713729,
    "out": 1713729,
    "in": 1805897
  },

There are 30,000+ events going in and not coming out. Check Guy's post here. This is not a slow input.

    "outputs": [ ....
      {
        "id": "c685636696572b6d74ea68ff726974823e22f806987cff4044310f1b953496a5",
        "name": "http",
        "events": {
          "duration_in_millis": 151778979,
          "out": 1713729,
          "in": 1759809
        }
      }
    ]
  },

Again, there are tens of thousands of event reaching the http output and not leaving. It is taking nearly a tenth of a second per event. It could be that encoding the events is really expensive, but I would concur with Leandro that the first place to look is the http destination.

I'll check outout http location tomorrow i don't manage that.

If i wanted to look at a possible encoding issue how would i go about that?

Is there something i can look for specifically?

I followed up with the group that handles the endpoint. They are not seeing any issues but are still digging into it.

I do see some bad gateway retrying errors they seem to go away after a bit.

@Badger you mentioned expensive encoding. How can i go about looking to see if that is the case. I don't see much in the logs about encoding. Do i need to turn on a different log setting other than info?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.