Logstash can not receive data from kafka


(Shih Yu Lee) #1

The structure I used is filebeat/metricbeat to kafka to logstash to es to kibana.
At the first time logstash actually received data from kafka. However, after I trid to develop the kafka cluster, logstash no longer received anything. I gave up the cluster and returned to previous configure and tried to remove all of the kafka and log (kafka and zookeeper), logstash can not receive anything.

I used kafka-console-consumer.sh check that kafka actually received data from beats, but logstash can not customer the topic.
The version I used is
ELK+Beats 6.3.0
kafka version I tried include 0.11.0.0 and 0.11.0.3

Logstash part

input{
kafka{
bootstrap_servers => "myaddress:9092"
topics => ["elk"]
auto_offset_reset => "latest"
consumer_threads => 1
decorate_events => true
}
}

beat part
follow the office guide https://www.elastic.co/guide/en/beats/metricbeat/current/kafka-output.html

I don't know why It worked at the first time but don't work after I tried to use the cluster.
It is still don't work although I removed kafka and logstash and all of the log and reinstall them.


(Ry Biesemeyer) #2

Are you continuing to put data into the Kafka topic, or are you attempting to re-process the same data?

If you're attempting to re-process the same data, your problem may be simple.

Kafka keeps track of consumer groups and how far they have consumed a topic, and won't give out a message to a consumer within a group that any member of the group has acknowledged.

By default, the Logstash Kafka Input Plugin identifies itself as belonging to the consumer group "logstash". If you would like to consume message that have already been consumed by the "logstash" group, you will need to either:

  • reset the offset in kafka (google); OR
  • identify logstash by a different group name using the group_id directive.

(Shih Yu Lee) #3

I deleted all of the group and topic and reinstall kafka. In the kibana, I found the data in the Timelion but nothing appear in the Discover.
After I checked kafka-console-consumer.sh and logstash output, I can see the data, but the time slow 8 hours. I thoguht the time difference is the reason I can not see log in the kibana - Discover.
However, filebeat send the correct timestamp to kafka (I checked from filebeat log).
Besides, when filebeat send the log to logstash directly(no kafka), the timestamp is correct.

How could I solve the issue?
Thank you.


(Ry Biesemeyer) #4

So the data is there, but each event's @timesramp is exactly 8 hours different than what you are expecting? This sounds like a timezone issue.

If it's not off by exactly 8 hours, it's more likely that Logstash is creating the event's with the current timestamp as it processes them (as it does by default) and the pipeline configuration isn't correctly telling it how to extract the timestamp from the message.

Can you paste a few lines of input (surrounded by markdown code fences ~~~ on their own lines to ensure formatting is kept)? Could you also paste the relevant bits from your Logstash pipeline configuration? How is Logstash being used to extract data? Are you using the Date Filter Plugin to set the event's timestamp?


(Shih Yu Lee) #5

I developed 02-beat-input.conf, 03-kafka-input.conf, 10-system-filter.conf, 30-elasticsearch-output.conf in conf.d

#02-beat-input.conf
input {
beats {
port => 5044
host => "myaddress"
}
}

#03-kafka-input.conf
input{
kafka{
bootstrap_servers => "myaddress:9092"
topics => ["elk"]
auto_offset_reset => "latest"
consumer_threads => 1
decorate_events => true
}
}

#30-elasticsearch-output.conf
output {
elasticsearch {
hosts => localhost
manage_template => false
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
}
}

10-system-filter.conf is same as the website https://www.elastic.co/guide/en/logstash/current/logstash-config-for-filebeat-modules.html

I use the metricbeats to test in order to get more log. and I don't use any filter for metricbeats system modules in logstash.


(Ry Biesemeyer) #6

The linked filter does use the date filter plugin to step the event's timestamp, so we're probably dealing with a timezone issue.

Do the raw events that you are receiving include a timezone or offset alongside the timestamp? If not, what timezone is your Logstash host in?


(Shih Yu Lee) #7

I remembered the input have timestamp, but I still need to check.
Besides, Could I ask if the message queue layer such as kafka is necessary?
If the agents I need to collect are over 2000 devices, Is logstash able to support?
I know the beats can sent the data by load balance to logstash and logstash has Persistent Queues, but kafka is able to set cluster. I am confused if the external queue layer is important when we use Elastic stack to receive very huge number of messages.


(Ry Biesemeyer) #8

My best advice: Add external queueing only if and when you are unable to keep up with bursts of traffic; don't make your setup more complex until the need is clear

Depending on the volume each device is sending and the complexity of your pipelines, 2000 devices isn't a particularly high number, especially for a well-equipped Logstash host. Beats can also be configured to send the events it reads to a "pool" of Logstash hosts without a load balancer. If your Logstash host(s) are able to keep up, keep it simple.

Since you're working with a protocol that supports back-pressure, a small PQ will likely serve you better than a large one. When the Logstash host(s) get a little overloaded with a spike in traffic while using a small PQ, the back-pressure allows the messages can spool on your edge nodes, but using a large PQ Logstash would continue to receive events, bogging it down and using up vital resources that could be being used to process the spike in traffic.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.