How to delete older logs in ELK to give each application a certain disk quota?


(Gabriel Corrêa de Oliveira) #1

Dear All,

I am trying to use the ELK stack in the following scenario:

I have about ten applications that send their logs, through Logstash, to a
single Elasticsearch cluster.

Some of these applications naturally generate more logs than others, and,
sometimes, one of them can go 'crazy', because of a bug, for instance, and,
thus, generate even more log entries than it normally does. As result, the
disk space available in the cluster can be unfairly 'taken' by the logs of
a single application, leaving not enough room to others.

I am currently managing the available disk space through Elasticsearch
Curator. It runs periodically, as it is in the crontab, and deletes older
indices based on a disk usage quota. When the disk space used by all
indices exceeds a certain limit, the oldest indices are deleted, one by
one, util the sum of the disk space used by them all is within the
specified limit again.

The first problem with this approach is that Elasticsearch Curator can only
delete entire indices. Hence, I had to configure Logstash to create one
different index per hour, and increase their granularity; thus, Curator
deletes smaller chunks of logs at a time. In addition, it is very difficult
to decide how often Curator should run. If applications are generating logs
at a higher rate, not even one-hour indices may be enough. Secondly, there
is no way to specify a disk usage quota for each different application.

Ideally, Elasticsearch should be able to delete older log entries by itself
whenever the indices reach a certain disk usage limit. This would eliminate
the problem of defining how often Curator should run. However, I could not
find any similar feature in the Elasticsearch manual.

Would anybody recommend a different approach to address these issues?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/42adcf86-0899-473c-b6c8-b7ca24a60544%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #2

The feature you described does not exist yet in elasticsearch. There is an open issue for such IIRC.
For now, I'd use curator.

My 2 cents.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 15 févr. 2015 à 19:34, Gabriel Corrêa de Oliveira gabriel.co@gmail.com a écrit :

Dear All,

I am trying to use the ELK stack in the following scenario:

I have about ten applications that send their logs, through Logstash, to a single Elasticsearch cluster.

Some of these applications naturally generate more logs than others, and, sometimes, one of them can go 'crazy', because of a bug, for instance, and, thus, generate even more log entries than it normally does. As result, the disk space available in the cluster can be unfairly 'taken' by the logs of a single application, leaving not enough room to others.

I am currently managing the available disk space through Elasticsearch Curator. It runs periodically, as it is in the crontab, and deletes older indices based on a disk usage quota. When the disk space used by all indices exceeds a certain limit, the oldest indices are deleted, one by one, util the sum of the disk space used by them all is within the specified limit again.

The first problem with this approach is that Elasticsearch Curator can only delete entire indices. Hence, I had to configure Logstash to create one different index per hour, and increase their granularity; thus, Curator deletes smaller chunks of logs at a time. In addition, it is very difficult to decide how often Curator should run. If applications are generating logs at a higher rate, not even one-hour indices may be enough. Secondly, there is no way to specify a disk usage quota for each different application.

Ideally, Elasticsearch should be able to delete older log entries by itself whenever the indices reach a certain disk usage limit. This would eliminate the problem of defining how often Curator should run. However, I could not find any similar feature in the Elasticsearch manual.

Would anybody recommend a different approach to address these issues?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/42adcf86-0899-473c-b6c8-b7ca24a60544%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0942B999-35B9-4E2C-A672-703E9DBD0616%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(Aaron Mildenstein) #3

Delete by space is extremely hard to do well with a fully distributed
system, like Elasticsearch. You could have 2 or more shards (primary or
replica) from one of the "busy" indices you have indicated residing on one
node, and none on another. How do you determine when disk space is filling
as a result of "busy" vs. "normal" log activity. How does a system know
which indices are potentially "problem" indices with too much data vs.
"normal" indices? Curator specifically recommends against using delete
by space because of these shortcomings.

A secondary system becomes necessary to manage delete-by-space with a
distributed system. You wind up having to do something like what Curator
does, by summing the space consumed by adding all shards together, but
would have to do it based on index names, or name patterns, and do alerting
on the results. It would also have to show "per pattern" usage per node,
since data is distributed. Such a system would require constant
monitoring, alerting, and/or acting. Elasticsearch is not (yet, at least)
designed to do this.

Again, delete by disk usage is a very difficult problem to solve with a
sharded, distributed system.

You could write your own monitoring system, based on your own usage or the
suggestions I made above, and make use of the Curator API
(http://curator.readthedocs.org) to do the behind-the-scenes work.

Good luck,

--Aaron

On Sunday, February 15, 2015 at 11:34:37 AM UTC-7, Gabriel Corrêa de
Oliveira wrote:

Dear All,

I am trying to use the ELK stack in the following scenario:

I have about ten applications that send their logs, through Logstash, to a
single Elasticsearch cluster.

Some of these applications naturally generate more logs than others, and,
sometimes, one of them can go 'crazy', because of a bug, for instance, and,
thus, generate even more log entries than it normally does. As result, the
disk space available in the cluster can be unfairly 'taken' by the logs of
a single application, leaving not enough room to others.

I am currently managing the available disk space through Elasticsearch
Curator. It runs periodically, as it is in the crontab, and deletes older
indices based on a disk usage quota. When the disk space used by all
indices exceeds a certain limit, the oldest indices are deleted, one by
one, util the sum of the disk space used by them all is within the
specified limit again.

The first problem with this approach is that Elasticsearch Curator can
only delete entire indices. Hence, I had to configure Logstash to create
one different index per hour, and increase their granularity; thus, Curator
deletes smaller chunks of logs at a time. In addition, it is very difficult
to decide how often Curator should run. If applications are generating logs
at a higher rate, not even one-hour indices may be enough. Secondly, there
is no way to specify a disk usage quota for each different application.

Ideally, Elasticsearch should be able to delete older log entries by
itself whenever the indices reach a certain disk usage limit. This would
eliminate the problem of defining how often Curator should run. However, I
could not find any similar feature in the Elasticsearch manual.

Would anybody recommend a different approach to address these issues?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4905a4ed-3dfa-4946-b06d-d873193895a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Gabriel Corrêa de Oliveira) #4

Well, in my case, it doesn't really matter if an index is growing due to
normal or abnormal log activity. What matters is to make sure that if one
application starts to generate excessive logs, it will not fill all disk
space available and leave no room to the others. I don't care if I lose the
logs of that application. It's more important to preserve log retention
capabilities for all the other applications that are behaving normally.

If one application has a bug and generates 1 zillion log entries in 10
minutes, I don't want to find out after one hour that I no longer have logs
of other 9 applications, from that time, just because one single
application filled all the available disk space in the ES cluster before
Curator had the chance to run and release some space.

I think an ideal log management solution should behave somewhat like
a collection of FIFO queues, each limited by its disk space quota. It
surprises me to see that these space management issues are not a pressing
concern in the ELK users community. As more and more companies are
migrating to service-based architectures, which require multiple smaller
applications (services) to run separately in the production environment, it
seems to me that the scenario I described should be very common these days.

Popular log frameworks, like Log4J, have addressed the disk management
problem a long time ago, with rotating file appenders.
Every single system in production will, at some point, consume all the disk
space available for logs, and, from that point on, regularly delete older
logs to make room for new entries.

I understand the complexities of implementing disk space management in a
sharded system, but I see that the ELK stack is 'advertised' as a very
robust log management solution; yet, it doesn't address such a common
requirement of log management systems.

I wonder if there's anything better on the market for these purposes.

Em segunda-feira, 16 de fevereiro de 2015 17:19:47 UTC-2, Aaron Mildenstein
escreveu:

Delete by space is extremely hard to do well with a fully distributed
system, like Elasticsearch. You could have 2 or more shards (primary or
replica) from one of the "busy" indices you have indicated residing on one
node, and none on another. How do you determine when disk space is filling
as a result of "busy" vs. "normal" log activity. How does a system know
which indices are potentially "problem" indices with too much data vs.
"normal" indices? Curator specifically recommends against using delete
by space because of these shortcomings.

A secondary system becomes necessary to manage delete-by-space with a
distributed system. You wind up having to do something like what
Curator does, by summing the space consumed by adding all shards together,
but would have to do it based on index names, or name patterns, and do
alerting on the results. It would also have to show "per pattern" usage
per node, since data is distributed. Such a system would require constant
monitoring, alerting, and/or acting. Elasticsearch is not (yet, at least)
designed to do this.

Again, delete by disk usage is a very difficult problem to solve with a
sharded, distributed system.

You could write your own monitoring system, based on your own usage or the
suggestions I made above, and make use of the Curator API (
http://curator.readthedocs.org) to do the behind-the-scenes work.

Good luck,

--Aaron

On Sunday, February 15, 2015 at 11:34:37 AM UTC-7, Gabriel Corrêa de
Oliveira wrote:

Dear All,

I am trying to use the ELK stack in the following scenario:

I have about ten applications that send their logs, through Logstash, to
a single Elasticsearch cluster.

Some of these applications naturally generate more logs than others, and,
sometimes, one of them can go 'crazy', because of a bug, for instance, and,
thus, generate even more log entries than it normally does. As result, the
disk space available in the cluster can be unfairly 'taken' by the logs of
a single application, leaving not enough room to others.

I am currently managing the available disk space through Elasticsearch
Curator. It runs periodically, as it is in the crontab, and deletes older
indices based on a disk usage quota. When the disk space used by all
indices exceeds a certain limit, the oldest indices are deleted, one by
one, util the sum of the disk space used by them all is within the
specified limit again.

The first problem with this approach is that Elasticsearch Curator can
only delete entire indices. Hence, I had to configure Logstash to create
one different index per hour, and increase their granularity; thus, Curator
deletes smaller chunks of logs at a time. In addition, it is very difficult
to decide how often Curator should run. If applications are generating logs
at a higher rate, not even one-hour indices may be enough. Secondly, there
is no way to specify a disk usage quota for each different application.

Ideally, Elasticsearch should be able to delete older log entries by
itself whenever the indices reach a certain disk usage limit. This would
eliminate the problem of defining how often Curator should run. However, I
could not find any similar feature in the Elasticsearch manual.

Would anybody recommend a different approach to address these issues?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/719667c3-7fe3-4cb5-8242-6cbd21cf472c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Aaron Mildenstein) #5

I hear what you’re saying. How are you loading your logs into Elasticsearch? Logstash has a throttle filter which helps to prevent just this sort of disk-space overconsumption from happening.

—Aaron

On Mon, Feb 16, 2015 at 3:09 PM, Gabriel Corrêa de Oliveira
gabriel.co@gmail.com wrote:

Well, in my case, it doesn't really matter if an index is growing due to
normal or abnormal log activity. What matters is to make sure that if one
application starts to generate excessive logs, it will not fill all disk
space available and leave no room to the others. I don't care if I lose the
logs of that application. It's more important to preserve log retention
capabilities for all the other applications that are behaving normally.
If one application has a bug and generates 1 zillion log entries in 10
minutes, I don't want to find out after one hour that I no longer have logs
of other 9 applications, from that time, just because one single
application filled all the available disk space in the ES cluster before
Curator had the chance to run and release some space.
I think an ideal log management solution should behave somewhat like
a collection of FIFO queues, each limited by its disk space quota. It
surprises me to see that these space management issues are not a pressing
concern in the ELK users community. As more and more companies are
migrating to service-based architectures, which require multiple smaller
applications (services) to run separately in the production environment, it
seems to me that the scenario I described should be very common these days.
Popular log frameworks, like Log4J, have addressed the disk management
problem a long time ago, with rotating file appenders.
Every single system in production will, at some point, consume all the disk
space available for logs, and, from that point on, regularly delete older
logs to make room for new entries.
I understand the complexities of implementing disk space management in a
sharded system, but I see that the ELK stack is 'advertised' as a very
robust log management solution; yet, it doesn't address such a common
requirement of log management systems.
I wonder if there's anything better on the market for these purposes.
Em segunda-feira, 16 de fevereiro de 2015 17:19:47 UTC-2, Aaron Mildenstein
escreveu:

Delete by space is extremely hard to do well with a fully distributed
system, like Elasticsearch. You could have 2 or more shards (primary or
replica) from one of the "busy" indices you have indicated residing on one
node, and none on another. How do you determine when disk space is filling
as a result of "busy" vs. "normal" log activity. How does a system know
which indices are potentially "problem" indices with too much data vs.
"normal" indices? Curator specifically recommends against using delete
by space because of these shortcomings.

A secondary system becomes necessary to manage delete-by-space with a
distributed system. You wind up having to do something like what
Curator does, by summing the space consumed by adding all shards together,
but would have to do it based on index names, or name patterns, and do
alerting on the results. It would also have to show "per pattern" usage
per node, since data is distributed. Such a system would require constant
monitoring, alerting, and/or acting. Elasticsearch is not (yet, at least)
designed to do this.

Again, delete by disk usage is a very difficult problem to solve with a
sharded, distributed system.

You could write your own monitoring system, based on your own usage or the
suggestions I made above, and make use of the Curator API (
http://curator.readthedocs.org) to do the behind-the-scenes work.

Good luck,

--Aaron

On Sunday, February 15, 2015 at 11:34:37 AM UTC-7, Gabriel Corrêa de
Oliveira wrote:

Dear All,

I am trying to use the ELK stack in the following scenario:

I have about ten applications that send their logs, through Logstash, to
a single Elasticsearch cluster.

Some of these applications naturally generate more logs than others, and,
sometimes, one of them can go 'crazy', because of a bug, for instance, and,
thus, generate even more log entries than it normally does. As result, the
disk space available in the cluster can be unfairly 'taken' by the logs of
a single application, leaving not enough room to others.

I am currently managing the available disk space through Elasticsearch
Curator. It runs periodically, as it is in the crontab, and deletes older
indices based on a disk usage quota. When the disk space used by all
indices exceeds a certain limit, the oldest indices are deleted, one by
one, util the sum of the disk space used by them all is within the
specified limit again.

The first problem with this approach is that Elasticsearch Curator can
only delete entire indices. Hence, I had to configure Logstash to create
one different index per hour, and increase their granularity; thus, Curator
deletes smaller chunks of logs at a time. In addition, it is very difficult
to decide how often Curator should run. If applications are generating logs
at a higher rate, not even one-hour indices may be enough. Secondly, there
is no way to specify a disk usage quota for each different application.

Ideally, Elasticsearch should be able to delete older log entries by
itself whenever the indices reach a certain disk usage limit. This would
eliminate the problem of defining how often Curator should run. However, I
could not find any similar feature in the Elasticsearch manual.

Would anybody recommend a different approach to address these issues?

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/2mR49Mdd3Uc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/719667c3-7fe3-4cb5-8242-6cbd21cf472c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1424134488903.0cb72e0f%40Nodemailer.
For more options, visit https://groups.google.com/d/optout.


(Gabriel Corrêa de Oliveira) #6

Yes, indeed, I am using Logstash.

I'll check how the throttle filter works.

Thank you.

Em segunda-feira, 16 de fevereiro de 2015 22:54:57 UTC-2, Aaron Mildenstein
escreveu:

I hear what you’re saying. How are you loading your logs into
Elasticsearch? Logstash has a throttle filter which helps to prevent just
this sort of disk-space overconsumption from happening.

—Aaron

On Mon, Feb 16, 2015 at 3:09 PM, Gabriel Corrêa de Oliveira <
gabri...@gmail.com> wrote:

Well, in my case, it doesn't really matter if an index is growing due to
normal or abnormal log activity. What matters is to make sure that if one
application starts to generate excessive logs, it will not fill all disk
space available and leave no room to the others. I don't care if I lose the
logs of that application. It's more important to preserve log retention
capabilities for all the other applications that are behaving normally.

If one application has a bug and generates 1 zillion log entries in 10
minutes, I don't want to find out after one hour that I no longer have logs
of other 9 applications, from that time, just because one single
application filled all the available disk space in the ES cluster before
Curator had the chance to run and release some space.

I think an ideal log management solution should behave somewhat like
a collection of FIFO queues, each limited by its disk space quota. It
surprises me to see that these space management issues are not a pressing
concern in the ELK users community. As more and more companies are
migrating to service-based architectures, which require multiple smaller
applications (services) to run separately in the production environment, it
seems to me that the scenario I described should be very common these days.

Popular log frameworks, like Log4J, have addressed the disk management
problem a long time ago, with rotating file appenders.
Every single system in production will, at some point, consume all the
disk space available for logs, and, from that point on, regularly delete
older logs to make room for new entries.

I understand the complexities of implementing disk space management in a
sharded system, but I see that the ELK stack is 'advertised' as a very
robust log management solution; yet, it doesn't address such a common
requirement of log management systems.

I wonder if there's anything better on the market for these purposes.

Em segunda-feira, 16 de fevereiro de 2015 17:19:47 UTC-2, Aaron
Mildenstein escreveu:

Delete by space is extremely hard to do well with a fully distributed
system, like Elasticsearch. You could have 2 or more shards (primary or
replica) from one of the "busy" indices you have indicated residing on one
node, and none on another. How do you determine when disk space is filling
as a result of "busy" vs. "normal" log activity. How does a system know
which indices are potentially "problem" indices with too much data vs.
"normal" indices? Curator specifically recommends against using
delete by space because of these shortcomings.

A secondary system becomes necessary to manage delete-by-space with a
distributed system. You wind up having to do something like what
Curator does, by summing the space consumed by adding all shards together,
but would have to do it based on index names, or name patterns, and do
alerting on the results. It would also have to show "per pattern" usage
per node, since data is distributed. Such a system would require constant
monitoring, alerting, and/or acting. Elasticsearch is not (yet, at least)
designed to do this.

Again, delete by disk usage is a very difficult problem to solve with a
sharded, distributed system.

You could write your own monitoring system, based on your own usage or
the suggestions I made above, and make use of the Curator API (
http://curator.readthedocs.org) to do the behind-the-scenes work.

Good luck,

--Aaron

On Sunday, February 15, 2015 at 11:34:37 AM UTC-7, Gabriel Corrêa de
Oliveira wrote:

Dear All,

I am trying to use the ELK stack in the following scenario:

I have about ten applications that send their logs, through Logstash,
to a single Elasticsearch cluster.

Some of these applications naturally generate more logs than others,
and, sometimes, one of them can go 'crazy', because of a bug, for instance,
and, thus, generate even more log entries than it normally does. As result,
the disk space available in the cluster can be unfairly 'taken' by the logs
of a single application, leaving not enough room to others.

I am currently managing the available disk space through Elasticsearch
Curator. It runs periodically, as it is in the crontab, and deletes older
indices based on a disk usage quota. When the disk space used by all
indices exceeds a certain limit, the oldest indices are deleted, one by
one, util the sum of the disk space used by them all is within the
specified limit again.

The first problem with this approach is that Elasticsearch Curator can
only delete entire indices. Hence, I had to configure Logstash to create
one different index per hour, and increase their granularity; thus, Curator
deletes smaller chunks of logs at a time. In addition, it is very difficult
to decide how often Curator should run. If applications are generating logs
at a higher rate, not even one-hour indices may be enough. Secondly, there
is no way to specify a disk usage quota for each different application.

Ideally, Elasticsearch should be able to delete older log entries by
itself whenever the indices reach a certain disk usage limit. This would
eliminate the problem of defining how often Curator should run. However, I
could not find any similar feature in the Elasticsearch manual.

Would anybody recommend a different approach to address these issues?

-- 

You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/2mR49Mdd3Uc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/719667c3-7fe3-4cb5-8242-6cbd21cf472c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/719667c3-7fe3-4cb5-8242-6cbd21cf472c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/43c9ecb9-b24e-426f-9b1b-2fc53949234b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jozef Vilcek) #7

I am trying to figure out how to do exactly same thing and could not find, so far, any information how to do it effectively.

Throttling filter could make problem with applications going wild and producing tons of data a bit less painful, but those application will still kill the storage if given "enough time". It is not the solution.

I was thinking, if this can be done by indexing new events, which are off quota limits, over old events. Something like:

  • from logging statistics, I can estimate average event size for an application
  • given fixed quota, I can set how many events should be allow to go into the index
  • create a fixed number of sequential IDs which will be used in cycle and assigned to documents sent to ES

Having very little experience with Elasticsearch, does something like this make sense / can be done?


(system) #8