Log management with Elasticsearch?


(Mike Pilkington) #1

Hi,

I'm wondering if anyone has used Elasticsearch to manage (provide
search for) large log file collections?

I'm looking for solutions to help with a semi-centralized log
management project. The logs would be sent in syslog-style format
from hundreds of servers/routers/firewalls and maintained on a few
dedicated log servers. I looked into Splunk, a popular log management
solution which has the ability to scale horizontally (add more servers
for more storage and performance). I assume it uses some sort of
NoSQL technology. Unfortunately, their solution is too expensive and
after searching for an open-source equivalent and not finding it, I'm
looking into possibly building a home-grown solution.

I came across the following blog post series on log management with
Hadoop: http://blog.mgm-tp.com/series/scalable-log-data-management-with-hadoop/.
In the comments of the 3rd blog post, Elasticsearch was mentioned as
a possibility and so I'm wondering if anyone out there has applied
Elasticsearch to log management?

Thanks,
Mike


(Berkay Mollamustafaoglu-2) #2

Hi Mike,

I think solution for the use case you're describing is combination of Flume
and ElasticSearch. Flume provides great infrastructure to aggregate logs
(has a syslog receiver) and all the data can be indexed in ElasticSearch for
querying later. All that is needed is an ElasticSearch Sink, the component
that would interface with ES, in flume terminology, which should be quite
straight forward.

Flume is an open source project recently released by Cloudera folks, user
guide can be found here:
archive.cloudera.com/cdh/3/flume/UserGuide.html

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Sun, Aug 15, 2010 at 9:30 AM, Mike Pilkington mpilking@gmail.com wrote:

Hi,

I'm wondering if anyone has used Elasticsearch to manage (provide
search for) large log file collections?

I'm looking for solutions to help with a semi-centralized log
management project. The logs would be sent in syslog-style format
from hundreds of servers/routers/firewalls and maintained on a few
dedicated log servers. I looked into Splunk, a popular log management
solution which has the ability to scale horizontally (add more servers
for more storage and performance). I assume it uses some sort of
NoSQL technology. Unfortunately, their solution is too expensive and
after searching for an open-source equivalent and not finding it, I'm
looking into possibly building a home-grown solution.

I came across the following blog post series on log management with
Hadoop:
http://blog.mgm-tp.com/series/scalable-log-data-management-with-hadoop/.
In the comments of the 3rd blog post, Elasticsearch was mentioned as
a possibility and so I'm wondering if anyone out there has applied
Elasticsearch to log management?

Thanks,
Mike


(Paul Smith) #3

This post was a few weeks ago, but I've started to put something
together for this on github, if anyone would like to collaborate.
Very basic structure now there, and works, still need to consider how
a fire hose stream of logs will cope, and whether one can perform
batch indexing to improve performance. It is very simple to do though
by the looks of it.

http://github.com/tallpsmith/elasticflume

cheers,

Paul

On Aug 15, 11:58 pm, Berkay Mollamustafaoglu mber...@gmail.com
wrote:

Hi Mike,

I think solution for the use case you're describing is combination of Flume
and ElasticSearch. Flume provides great infrastructure to aggregate logs
(has a syslog receiver) and all the data can be indexed in ElasticSearch for
querying later. All that is needed is an ElasticSearch Sink, the component
that would interface with ES, in flume terminology, which should be quite
straight forward.

Flume is an open source project recently released by Cloudera folks, user
guide can be found here:
archive.cloudera.com/cdh/3/flume/UserGuide.html

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Sun, Aug 15, 2010 at 9:30 AM, Mike Pilkington mpilk...@gmail.com wrote:

Hi,

I'm wondering if anyone has used Elasticsearch to manage (provide
search for) large log file collections?

I'm looking for solutions to help with a semi-centralized log
management project. The logs would be sent in syslog-style format
from hundreds of servers/routers/firewalls and maintained on a few
dedicated log servers. I looked into Splunk, a popular log management
solution which has the ability to scale horizontally (add more servers
for more storage and performance). I assume it uses some sort of
NoSQL technology. Unfortunately, their solution is too expensive and
after searching for an open-source equivalent and not finding it, I'm
looking into possibly building a home-grown solution.

I came across the following blog post series on log management with
Hadoop:
http://blog.mgm-tp.com/series/scalable-log-data-management-with-hadoop/.
In the comments of the 3rd blog post, Elasticsearch was mentioned as
a possibility and so I'm wondering if anyone out there has applied
Elasticsearch to log management?

Thanks,
Mike


(Shay Banon) #4

Looks great!. I will add it to the projects page. I think this:
http://github.com/tallpsmith/elasticflume/blob/master/src/main/java/org/elasticsearch/flume/ElasticSearchSink.javaspeaks
volumes on the simplicity of both elasticsearch and flume!

On Sun, Sep 12, 2010 at 1:32 AM, tallpsmith tallpsmith@gmail.com wrote:

This post was a few weeks ago, but I've started to put something
together for this on github, if anyone would like to collaborate.
Very basic structure now there, and works, still need to consider how
a fire hose stream of logs will cope, and whether one can perform
batch indexing to improve performance. It is very simple to do though
by the looks of it.

http://github.com/tallpsmith/elasticflume

cheers,

Paul

On Aug 15, 11:58 pm, Berkay Mollamustafaoglu mber...@gmail.com
wrote:

Hi Mike,

I think solution for the use case you're describing is combination of
Flume
and ElasticSearch. Flume provides great infrastructure to aggregate logs
(has a syslog receiver) and all the data can be indexed in ElasticSearch
for
querying later. All that is needed is an ElasticSearch Sink, the
component
that would interface with ES, in flume terminology, which should be quite
straight forward.

Flume is an open source project recently released by Cloudera folks, user
guide can be found here:
archive.cloudera.com/cdh/3/flume/UserGuide.html

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Sun, Aug 15, 2010 at 9:30 AM, Mike Pilkington mpilk...@gmail.com
wrote:

Hi,

I'm wondering if anyone has used Elasticsearch to manage (provide
search for) large log file collections?

I'm looking for solutions to help with a semi-centralized log
management project. The logs would be sent in syslog-style format
from hundreds of servers/routers/firewalls and maintained on a few
dedicated log servers. I looked into Splunk, a popular log management
solution which has the ability to scale horizontally (add more servers
for more storage and performance). I assume it uses some sort of
NoSQL technology. Unfortunately, their solution is too expensive and
after searching for an open-source equivalent and not finding it, I'm
looking into possibly building a home-grown solution.

I came across the following blog post series on log management with
Hadoop:
http://blog.mgm-tp.com/series/scalable-log-data-management-with-hadoop/
.

In the comments of the 3rd blog post, Elasticsearch was mentioned as
a possibility and so I'm wondering if anyone out there has applied
Elasticsearch to log management?

Thanks,
Mike


(Ted Karmel) #5

+1 for Shay's comment...

Paul, I was considering this very option and just saw your email and
git on the subject.

On Sun, Sep 12, 2010 at 9:30 AM, Shay Banon
shay.banon@elasticsearch.com wrote:

Looks great!. I will add it to the projects page. I think this:
http://github.com/tallpsmith/elasticflume/blob/master/src/main/java/org/elasticsearch/flume/ElasticSearchSink.java
speaks volumes on the simplicity of both elasticsearch and flume!

On Sun, Sep 12, 2010 at 1:32 AM, tallpsmith tallpsmith@gmail.com wrote:

This post was a few weeks ago, but I've started to put something
together for this on github, if anyone would like to collaborate.
Very basic structure now there, and works, still need to consider how
a fire hose stream of logs will cope, and whether one can perform
batch indexing to improve performance. It is very simple to do though
by the looks of it.

http://github.com/tallpsmith/elasticflume

cheers,

Paul

On Aug 15, 11:58 pm, Berkay Mollamustafaoglu mber...@gmail.com
wrote:

Hi Mike,

I think solution for the use case you're describing is combination of
Flume
and ElasticSearch. Flume provides great infrastructure to aggregate logs
(has a syslog receiver) and all the data can be indexed in ElasticSearch
for
querying later. All that is needed is an ElasticSearch Sink, the
component
that would interface with ES, in flume terminology, which should be
quite
straight forward.

Flume is an open source project recently released by Cloudera folks,
user
guide can be found here:
archive.cloudera.com/cdh/3/flume/UserGuide.html

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Sun, Aug 15, 2010 at 9:30 AM, Mike Pilkington mpilk...@gmail.com
wrote:

Hi,

I'm wondering if anyone has used Elasticsearch to manage (provide
search for) large log file collections?

I'm looking for solutions to help with a semi-centralized log
management project. The logs would be sent in syslog-style format
from hundreds of servers/routers/firewalls and maintained on a few
dedicated log servers. I looked into Splunk, a popular log management
solution which has the ability to scale horizontally (add more servers
for more storage and performance). I assume it uses some sort of
NoSQL technology. Unfortunately, their solution is too expensive and
after searching for an open-source equivalent and not finding it, I'm
looking into possibly building a home-grown solution.

I came across the following blog post series on log management with
Hadoop:

http://blog.mgm-tp.com/series/scalable-log-data-management-with-hadoop/.
In the comments of the 3rd blog post, Elasticsearch was mentioned as
a possibility and so I'm wondering if anyone out there has applied
Elasticsearch to log management?

Thanks,
Mike


(system) #6