Flume-NG ElasticSearch Sink Backing up @ Midnight

Matt_Wise · April 10, 2014, 4:03pm

We use Flume 1.4 to pass logs into HDFS as well as Elasticsearch for
storage. The pipeline looks roughly like this:

Client to Server Flow...
(local_app -> local_host_flume_agent) ---- AVRO/SSL ---->
(remote_flume_agent)...

Agent Server Flow ...
(inbound avro -> FC1 -> Elasticsearch)
(inbound avro -> FC2 -> S3/HDFS)

In the last week we've made a few changes and now we're seeing a bit of a
problem. We'e seen 3 different occurrences of a single flume agent server
node beginning to back up its FC1 channel indefinitely until we log in and
restart Flume entirely. The data just stops flowing -- we can't find any
errors in the logs on either the ES or Flume side. A simple restart of
Flume fixes it.

Our sink config looks like this:

agent.sinks.elasticsearch.type =
org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.elasticsearch.hostNames = xxx:9300
agent.sinks.elasticsearch.indexName = flume
agent.sinks.elasticsearch.clusterName =
flume-elasticsearch-production-useast1
agent.sinks.elasticsearch.batchSize = 1000
agent.sinks.elasticsearch.ttl = 30
agent.sinks.elasticsearch.serializer =
org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer
agent.sinks.elasticsearch.channel = fc-unstructured-es

This ONLY happens at Midnight, and only happens on one flume server. I'm
wondering whether it has to do with the time it takes our ES nodes to
create a new index ... and the first flume agent that triggers "index
creation" could be getting blocked or stuck?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e7892490-d2f6-442f-ae25-18b59021e7e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matt_Wise · April 10, 2014, 4:12pm

One additional thing.. we have two ES sinks actually pointing to the same
cluster. The config looks more like this actually:
(inbound avro -> FC1 -> Elasticsearch)
(inbound avro -> FC2 -> S3/HDFS)
(inbound avro_2 -> FC3 -> Elasticsearch)
(inbound avro_2 -> FC4 -> S3/HDFS)

On Thursday, April 10, 2014 9:03:25 AM UTC-7, Matt wrote:

We use Flume 1.4 to pass logs into HDFS as well as Elasticsearch for
storage. The pipeline looks roughly like this:

Client to Server Flow...
(local_app -> local_host_flume_agent) ---- AVRO/SSL ---->
(remote_flume_agent)...

Agent Server Flow ...
(inbound avro -> FC1 -> Elasticsearch)
(inbound avro -> FC2 -> S3/HDFS)

In the last week we've made a few changes and now we're seeing a bit of a
problem. We'e seen 3 different occurrences of a single flume agent server
node beginning to back up its FC1 channel indefinitely until we log in and
restart Flume entirely. The data just stops flowing -- we can't find any
errors in the logs on either the ES or Flume side. A simple restart of
Flume fixes it.

Our sink config looks like this:

agent.sinks.elasticsearch.type =
org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.elasticsearch.hostNames = xxx:9300
agent.sinks.elasticsearch.indexName = flume
agent.sinks.elasticsearch.clusterName =
flume-elasticsearch-production-useast1
agent.sinks.elasticsearch.batchSize = 1000
agent.sinks.elasticsearch.ttl = 30
agent.sinks.elasticsearch.serializer =
org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer
agent.sinks.elasticsearch.channel = fc-unstructured-es

This ONLY happens at Midnight, and only happens on one flume server. I'm
wondering whether it has to do with the time it takes our ES nodes to
create a new index ... and the first flume agent that triggers "index
creation" could be getting blocked or stuck?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3056add8-40e8-4156-b8d6-834f34baf8c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

brian_yoder · April 10, 2014, 6:34pm

Matt,

I don't know if this helps, but we are seeing similar issues with Flume
using log4j2 (not log4j v1 as used by ES). For tomcat-hosted servlets,
flume failover works fine. But for non-tomcat applications (such as looping
batch-mode applications and Netty-based servers with static main entry
points), we have found that when one of their flume loggers fails, there is
no failover.

We don't have a solution. But a workaround is that the non-tomcat
applications only configure their log4j to write to one flume agent. If
that fails, events are queued until the agent comes back up. No failover,
but no data loss either.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e51a0926-7742-4cd7-877d-1c1d12ed2a1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Logstash, flume vs elasticsearch Elasticsearch	5	3374	July 6, 2017
Flume not communicating with Elasticsearch Elasticsearch	1	1088	July 9, 2017
"Rivers" & Flume Elasticsearch	2	249	July 6, 2017
Index Creation Failure Elasticsearch	3	2457	July 6, 2017
Logstash stop communicating with Elasticsearch Elasticsearch	4	599	July 6, 2017

Flume-NG ElasticSearch Sink Backing up @ Midnight

Related topics