Logstash, flume vs elasticsearch

hi,

i need a centralized logging-solution.
My first approach is this:

log4j->flume-Agent-> |
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> |

Client: log4j->flume-Agent
Server: flume-collector->elasticsearch-data-node

Now i have to increase the number of server. I think i cant reconfigure the
flume agents without restart them (i post this problem in flume mailing
list; logstash has the same problem). Now i have a new idea:

log4j->flume-Agent->elasticsearch-non-data-node |
log4j->flume-Agent->elasticsearch-non-data-node |-> elasticsearch-data-node
log4j->flume-Agent->elasticsearch-non-data-node |-> elasticsearch-data-node
log4j->flume-Agent->elasticsearch-non-data-node |

Client: log4j->flume-Agent->elasticsearch-non-data-node
Server: elasticsearch-data-node

So i can increase the number of servers and the es-non-data-node will
detect them und include them in their load balancing.
I need high persistence. I'm not allowed to lose messages but it should be
fast :confused:
are there any disadvantages using the second approach?

Regards,
Simon

--

Hello Simon,

I think the disadvantages of the second solution would be:

  • if you have lots of clients, you'll have quite a big cluster, even though
    few nodes will hold data. This implies a lot of discovery (ping) between
    all your servers and clients, so you might experience timeouts
  • you'd have quite a lot of processes running on the clients

But, back to your first solution, what's wrong with simply adding a new
server? Won't it become as shown below?

log4j->flume-Agent-> |
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node

Or is there the problem of restarting flume to apply a new config? If so,
this still looks fine:

log4j->flume-Agent-> |
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> | elasticsearch-data-node

And if you have to add more servers and those running the flume-collector
end up being too stressed up, you can set node.data: false on their ES node
and let them act as routers. So it would look like:

log4j->flume-Agent-> | elasticsearch-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-non-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-non-data-node
log4j->flume-Agent-> | elasticsearch-data-node

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Thu, Nov 29, 2012 at 11:50 AM, Simon Monecke simonmonecke@gmail.com
wrote:

hi,

i need a centralized logging-solution.
My first approach is this:

log4j->flume-Agent-> |
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> |

Client: log4j->flume-Agent
Server: flume-collector->elasticsearch-data-node

Now i have to increase the number of server. I think i cant reconfigure
the flume agents without restart them (i post this problem in flume mailing
list; logstash has the same problem). Now i have a new idea:

log4j->flume-Agent->elasticsearch-non-data-node |
log4j->flume-Agent->elasticsearch-non-data-node |->
elasticsearch-data-node
log4j->flume-Agent->elasticsearch-non-data-node |->
elasticsearch-data-node
log4j->flume-Agent->elasticsearch-non-data-node |

Client: log4j->flume-Agent->elasticsearch-non-data-node
Server: elasticsearch-data-node

So i can increase the number of servers and the es-non-data-node will
detect them und include them in their load balancing.
I need high persistence. I'm not allowed to lose messages but it should
be fast :confused:
are there any disadvantages using the second approach?

Regards,
Simon

--

--

Hi Radu,

thanks for your detailed answer.

Or is there the problem of restarting flume to apply a new config?

Yes, the problem is to restart the flume agents. During this restart i
would lose messages (log4j cant connect and discard the messages).

log4j->flume-Agent-> | elasticsearch-data-node

log4j->flume-Agent-> |->flume-collector->elasticsearch-non-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-non-data-node
log4j->flume-Agent-> | elasticsearch-data-node

Ok, this could work. The only problem is to chose the right number of
non-data-nodes to get no bottleneck. But i think the only solution is try
and error to find the best settings for my system.

Thank you for stilling my fears :wink:

have a nice day. Regards,
Simon

2012/11/29 Radu Gheorghe radu.gheorghe@sematext.com

Hello Simon,

I think the disadvantages of the second solution would be:

  • if you have lots of clients, you'll have quite a big cluster, even
    though few nodes will hold data. This implies a lot of discovery (ping)
    between all your servers and clients, so you might experience timeouts
  • you'd have quite a lot of processes running on the clients

But, back to your first solution, what's wrong with simply adding a new
server? Won't it become as shown below?

log4j->flume-Agent-> |
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node

Or is there the problem of restarting flume to apply a new config? If so,
this still looks fine:

log4j->flume-Agent-> |
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> | elasticsearch-data-node

And if you have to add more servers and those running the flume-collector
end up being too stressed up, you can set node.data: false on their ES node
and let them act as routers. So it would look like:

log4j->flume-Agent-> | elasticsearch-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-non-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-non-data-node
log4j->flume-Agent-> | elasticsearch-data-node

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Thu, Nov 29, 2012 at 11:50 AM, Simon Monecke simonmonecke@gmail.com
wrote:

hi,

i need a centralized logging-solution.
My first approach is this:

log4j->flume-Agent-> |
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-data-node
log4j->flume-Agent-> |

Client: log4j->flume-Agent
Server: flume-collector->elasticsearch-data-node

Now i have to increase the number of server. I think i cant reconfigure
the flume agents without restart them (i post this problem in flume mailing
list; logstash has the same problem). Now i have a new idea:

log4j->flume-Agent->elasticsearch-non-data-node |
log4j->flume-Agent->elasticsearch-non-data-node |->
elasticsearch-data-node
log4j->flume-Agent->elasticsearch-non-data-node |->
elasticsearch-data-node
log4j->flume-Agent->elasticsearch-non-data-node |

Client: log4j->flume-Agent->elasticsearch-non-data-node
Server: elasticsearch-data-node

So i can increase the number of servers and the es-non-data-node will
detect them und include them in their load balancing.
I need high persistence. I'm not allowed to lose messages but it should
be fast :confused:
are there any disadvantages using the second approach?

Regards,
Simon

--

--

--

Hello Simon,

On Thu, Nov 29, 2012 at 12:52 PM, Simon Monecke simonmonecke@gmail.comwrote:

Hi Radu,

thanks for your detailed answer.

Or is there the problem of restarting flume to apply a new config?

Yes, the problem is to restart the flume agents. During this restart i
would lose messages (log4j cant connect and discard the messages).

log4j->flume-Agent-> | elasticsearch-data-node

log4j->flume-Agent-> |->flume-collector->elasticsearch-non-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-non-data-node
log4j->flume-Agent-> | elasticsearch-data-node

Ok, this could work. The only problem is to chose the right number of
non-data-nodes to get no bottleneck. But i think the only solution is try
and error to find the best settings for my system.

Right. Trying is always the best thing to make sure :slight_smile:

But since the non-data nodes would only basically serve as "routers", I
don't think it's likely that they would be the bottleneck. Unless the
machines are smaller than the data nodes, and you have lots of shards and
lots of data nodes.

Above I've suggested two collectors and non-data nodes for high
availability. So when one of them goes down, there's still a path for logs
to go through. But if you're OK with buffering on the client side when
there's an outage, you can also transform your design in something simpler,
like:

log4j->flume-Agent-> | elasticsearch-data-node
log4j->flume-Agent-> |->elasticsearch-non-data-node
log4j->flume-Agent-> | elasticsearch-data-node
log4j->flume-Agent-> | elasticsearch-data-node

Another possible solution, which also implies restarting Agent, might be to
use multiple destinations, possible all data nodes, and use a Load
Balancing Sink processor:
http://flume.apache.org/FlumeUserGuide.html#load-balancing-sink-processor

So the design becomes:

log4j->flume-Agent-> |->elasticsearch-data-node
log4j->flume-Agent-> |->elasticsearch-data-node
log4j->flume-Agent-> |->elasticsearch-data-node
log4j->flume-Agent-> |->elasticsearch-data-node

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

I'm working on a free(mium) product that does this exactly this (you can
pick the configuration you want) and more, so I'll give you my opinion on
this.

All the solutions mentioned above have pros and cons. You should consider
whether you want a couple of data processors, collectors that are really
heavy, or lots of them (on all the clients). Most of the time this choice
is really simple, if you have hundreds of servers it might be lots and lots
easier to do a little work on lots of machines, compared to doing all that
work in a separate cluster.

The amount of work (e.g. processing, alerting, pattern detection, indexing,
searching) also shifts the point of where you consider a dedicated cluster
compared to the "all-for-one, one-for-all" strategy.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

On Thu, Nov 29, 2012 at 1:22 PM, Radu Gheorghe
radu.gheorghe@sematext.comwrote:

Hello Simon,

On Thu, Nov 29, 2012 at 12:52 PM, Simon Monecke simonmonecke@gmail.comwrote:

Hi Radu,

thanks for your detailed answer.

Or is there the problem of restarting flume to apply a new config?

Yes, the problem is to restart the flume agents. During this restart i
would lose messages (log4j cant connect and discard the messages).

log4j->flume-Agent-> | elasticsearch-data-node

log4j->flume-Agent-> |->flume-collector->elasticsearch-non-data-node
log4j->flume-Agent-> |->flume-collector->elasticsearch-non-data-node
log4j->flume-Agent-> | elasticsearch-data-node

Ok, this could work. The only problem is to chose the right number of
non-data-nodes to get no bottleneck. But i think the only solution is try
and error to find the best settings for my system.

Right. Trying is always the best thing to make sure :slight_smile:

But since the non-data nodes would only basically serve as "routers", I
don't think it's likely that they would be the bottleneck. Unless the
machines are smaller than the data nodes, and you have lots of shards and
lots of data nodes.

Above I've suggested two collectors and non-data nodes for high
availability. So when one of them goes down, there's still a path for logs
to go through. But if you're OK with buffering on the client side when
there's an outage, you can also transform your design in something simpler,
like:

log4j->flume-Agent-> | elasticsearch-data-node
log4j->flume-Agent-> |->elasticsearch-non-data-node
log4j->flume-Agent-> | elasticsearch-data-node
log4j->flume-Agent-> | elasticsearch-data-node

Another possible solution, which also implies restarting Agent, might be
to use multiple destinations, possible all data nodes, and use a Load
Balancing Sink processor:
Flume 1.11.0 User Guide — Apache Flume

So the design becomes:

log4j->flume-Agent-> |->elasticsearch-data-node
log4j->flume-Agent-> |->elasticsearch-data-node
log4j->flume-Agent-> |->elasticsearch-data-node
log4j->flume-Agent-> |->elasticsearch-data-node

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

--