I’m currently using centralized Logstash management from Kibana, with a single pipeline in place.
Logstash Version: 8.11
Elasticsearch Version: 8.11
Here’s my pipeline configuration:
input {
elasticsearch {
hosts => [""]
user => ""
password => ""
ssl_enabled => true
ssl_verification_mode => "none"
index => ".ds-traces-apm-*"
query => '{
"_source": ["span.id", "span.name", "transaction.id"],
"query": {
"bool": {
"must": [
{ "exists": { "field": "transaction.id" } },
{ "term": { "processor.event": "span" } },
{ "range": { "@timestamp": { "gte": "now-1m", "lte": "now" } } }
],
"must_not": [
{ "term": { "span.action": "query" } }
]
}
}
}'
schedule => "* * * * *"
docinfo => true
}
}
filter {
if [transaction][id] {
elasticsearch {
hosts => [""]
user => ""
password => ""
ssl_enabled => true
ssl_verification_mode => "none"
index => ".ds-traces-apm-*"
query => 'processor.event:transaction AND transaction.id:"%{[transaction][id]}"'
fields => { "[transaction][name]" => "[transaction][name]" }
tag_on_failure => ["_transaction_id_lookup_failure"]
}
}
fingerprint {
source => ["[span][id]", "[transaction][id]"]
target => "uuid"
method => "MD5"
concatenate_sources => true
}
}
output {
elasticsearch {
document_id => "%{uuid}"
hosts => [""]
user => ""
password => ""
doc_as_upsert => true
action => update
index => "transaction_span_enrichment"
ssl_enabled => true
ssl_verification_mode => "none"
}
}
Current Situation:
- Ingest rate in elasticsearch apm-traces of spans: ~300,000 per minute
- Logstash Node Monitoring shows: 125 workers active
- Logstash pipeline settings:
pipeline.workers
: 16pipeline.batch.size
: 300000pipeline.batch.delay
: 50queue.type
: persistedqueue.max_bytes
: 4 (unit not clear—MB/GB?)queue.page_capacity
: 2048 (again, unit?)
I want to match that number in logstash to elasticsearch same. means indexing into elasticsearch from kibana should be atleat 300000 per minute.
Currently there is only one pipeline but will add more for other logs as well.
Despite these settings, I'm still seeing high load and more workers than expected on the node (125).
And logs ingested by logstash in index are montly have gap of 5 minutes or more after that it ingest about maximum 75000 or 125000 for 1 minute and stops and then gap that way.
Pipeline monitoring.
What I Need:
- Guidance on how to tune pipeline parameters (workers, batch size, delay, queue settings).
- Resource recommendations (CPU cores, memory) to handle this ingestion rate efficiently.
- Any best practices or patterns for handling high-throughput Elasticsearch inputs + lookups in the filter phase.