I recently changed my configuration from one logstash.config file to pipelines to make it easier.
My pipelines.yml (replaced actual names with letters, main re-routes data to correct pipeline based on tag):
- pipeline.id: main
path.config: "/usr/share/logstash/pipeline/main.config"
- pipeline.id: a
path.config: "/usr/share/logstash/pipeline/a.config"
- pipeline.id: b
path.config: "/usr/share/logstash/pipeline/b.config"
- pipeline.id: c
path.config: "/usr/share/logstash/pipeline/c.config"
- pipeline.id: d
path.config: "/usr/share/logstash/pipeline/d.config"
- pipeline.id: e
path.config: "/usr/share/logstash/pipeline/e.config"
- pipeline.id: f
path.config: "/usr/share/logstash/pipeline/f.config"
I see on Elasticsearch, a large amount of daily data being sent (was usually max 5mb, now it is 1 gb)
I also see this error in the logstash logs:
[ERROR][logstash.outputs.elasticsearch][main][6bdcb4726a198461b0a3bc504bd116ed5ae4dc3a4e92f278a77b790bc12a0ceb] Attempted to send a bulk request but there are no living connections in the pool (perhaps Elasticsearch is unreachable or down?) {:message=>"No Available connections", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError, :will_retry_in_seconds=>16}
[WARN ][logstash.outputs.elasticsearch][main][39c5e157a8fc0ce37f379032b7514bc216a85707441f8b16bfdf1757bb7fd6a6] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elasticsearch:9200/][Manticore::ClientProtocolException] elasticsearch:9200 failed to respond {:url=>http://elasticsearch:9200/, :error_message=>"Elasticsearch Unreachable: [http://elasticsearch:9200/][Manticore::ClientProtocolException] elasticsearch:9200 failed to respond", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
I also cannot query this data on Kibana and get this error:
Error: Batch request failed with status 503
Little confused as to what is going on here or why so much data is being sent.
Looking at the data in Elasticsearch (using elastichead), I see 60 duplicates of data in one field for each log, here's an example of my configuration as I assume the issue is there:
main.config
input {
beats {
port => 5044
host => "0.0.0.0"
ssl => false
}
}
output {
if [fields][log_type] == "a" {
pipeline { send_to => a }
}
else if [fields][log_type] == "b" {
pipeline { send_to => b }
}
else if [fields][log_type] == "c" {
pipeline { send_to => c }
}
else if [fields][log_type] == "d" {
pipeline { send_to => d }
}
else if [fields][log_type] == "e" {
pipeline { send_to => e }
}
else if [fields][log_type] == "f" {
pipeline { send_to => f }
}
}
a.config
input {
pipeline {
address => "a"
}
}
filter {
if [fields][log_type] == "a" {
grok {
}
date {
match => ["logdate", "YYYY-MM-dd HH:mm:ss,SSS"]
target => "logdate"
}
}
}
output {
if [fields][log_type] == "a" {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "a-logdata=%{+YYYY.MM.dd}"
}
}
}
Setting path.config prevents pipelines.yml being used. If path.config points to a directory then all the files are concatenated. Events are read from all of the inputs, processed by all of the filters, and then all of the events are sent to all of the filters. New users very often misunderstand this and think each configuration file stands alone.
For events with [fields][log_type] == "a" an event will reach the Elasticsearch output shown once directly from the beats input, and once after going through pipeline output/input pair a.
If you have any outputs with are not wrapped in a test of [fields][log_type] they will receive each event from beats, and duplicates from pipelines a, b, c, d, e, and f.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.