I'm a newbie and have spent a day or so reading up on elasticsearch and
performance. We are using logstash to parse log lines and sending to
graylog2 which writes/reads ES. "time-based indexing" is mentioned a lot on
various websites/posts but I have not found any concrete instructions.
Here's my current guess at doing time-based indexing, for weekly indices:
create a template that will match on all index names, setting up index
config, mappings, etc.
once a week run a script that does the following:
** create a new index with a name containing the date of next Sunday (to
rotate at the start of Sunday)
** add the new index to the single alias (for example "graylog") with the
following filter:
{
"range" : {
"created_at" : {
"from" : ,
"to" : <next Sunday +1week UTC seconds>,
"include_upper": false
}
}
}
** remove the old index from the alias
** delete the old index (archive first?)
** optimize the most recently completed index
However, I don't think this will work because the filter is not applied on
indexing, only searching. (As a general solution there are potential
problems which do not occur in the graylog2 case: what if the document does
not contain "created_at"? or the filters do not cover the space, or they
overlap? A way around these is to have a default index which would be the
most recent.)
Does this work? If not what exactly is the way to configure time-based
indexing using only a single alias?
It's too bad that there isn't more explicit support for tuning ES for
logging applications (time series data), as it appears to be a common use
case and ES has excellent facilities.
You can simply create another alias that will always point to the latest
index, and index all records using this alias. Every Sunday, when you add a
new index and add it to the list of search aliases, you can flip this index
alias to point to the new index. It can be done in the same request.
On Monday, December 17, 2012 2:42:44 PM UTC-5, squirrel wrote:
I'm a newbie and have spent a day or so reading up on elasticsearch and
performance. We are using logstash to parse log lines and sending to
graylog2 which writes/reads ES. "time-based indexing" is mentioned a lot on
various websites/posts but I have not found any concrete instructions.
This post has come closest, but the configuration requires the use of a
different index alias for reading and writing, and I don't think graylog2
will do that:
Here's my current guess at doing time-based indexing, for weekly indices:
create a template that will match on all index names, setting up index
config, mappings, etc.
once a week run a script that does the following:
** create a new index with a name containing the date of next Sunday (to
rotate at the start of Sunday)
** add the new index to the single alias (for example "graylog") with the
following filter:
{
"range" : {
"created_at" : {
"from" : ,
"to" : <next Sunday +1week UTC seconds>,
"include_upper": false
}
}
}
** remove the old index from the alias
** delete the old index (archive first?)
** optimize the most recently completed index
However, I don't think this will work because the filter is not applied on
indexing, only searching. (As a general solution there are potential
problems which do not occur in the graylog2 case: what if the document does
not contain "created_at"? or the filters do not cover the space, or they
overlap? A way around these is to have a default index which would be the
most recent.)
Does this work? If not what exactly is the way to configure time-based
indexing using only a single alias?
It's too bad that there isn't more explicit support for tuning ES for
logging applications (time series data), as it appears to be a common use
case and ES has excellent facilities.
Yes, using 2 aliases that's the way to go, but wondering if there is a single-alias method.
Guess I'll look into how to add a "write" elasticsearch_index_name config
to graylog2.
On Tuesday, December 18, 2012 11:08:56 AM UTC-8, Igor Motov wrote:
You can simply create another alias that will always point to the latest
index, and index all records using this alias. Every Sunday, when you add a
new index and add it to the list of search aliases, you can flip this index
alias to point to the new index. It can be done in the same request.
On Monday, December 17, 2012 2:42:44 PM UTC-5, squirrel wrote:
I'm a newbie and have spent a day or so reading up on elasticsearch and
performance. We are using logstash to parse log lines and sending to
graylog2 which writes/reads ES. "time-based indexing" is mentioned a lot on
various websites/posts but I have not found any concrete instructions.
This post has come closest, but the configuration requires the use of a
different index alias for reading and writing, and I don't think graylog2
will do that:
Here's my current guess at doing time-based indexing, for weekly indices:
create a template that will match on all index names, setting up index
config, mappings, etc.
once a week run a script that does the following:
** create a new index with a name containing the date of next Sunday (to
rotate at the start of Sunday)
** add the new index to the single alias (for example "graylog") with the
following filter:
{
"range" : {
"created_at" : {
"from" : ,
"to" : <next Sunday +1week UTC seconds>,
"include_upper": false
}
}
}
** remove the old index from the alias
** delete the old index (archive first?)
** optimize the most recently completed index
However, I don't think this will work because the filter is not applied
on indexing, only searching. (As a general solution there are potential
problems which do not occur in the graylog2 case: what if the document does
not contain "created_at"? or the filters do not cover the space, or they
overlap? A way around these is to have a default index which would be the
most recent.)
Does this work? If not what exactly is the way to configure time-based
indexing using only a single alias?
It's too bad that there isn't more explicit support for tuning ES for
logging applications (time series data), as it appears to be a common use
case and ES has excellent facilities.
FYI - Looks like the graylog2 folks are already working on a complete
solution for time-based indexing support in a future release.
On Tuesday, December 18, 2012 11:44:51 AM UTC-8, squirrel wrote:
Yes, using 2 aliases that's the way to go, but wondering if there is a single-alias method.
Guess I'll look into how to add a "write" elasticsearch_index_name config
to graylog2.
On Tuesday, December 18, 2012 11:08:56 AM UTC-8, Igor Motov wrote:
You can simply create another alias that will always point to the latest
index, and index all records using this alias. Every Sunday, when you add a
new index and add it to the list of search aliases, you can flip this index
alias to point to the new index. It can be done in the same request.
On Monday, December 17, 2012 2:42:44 PM UTC-5, squirrel wrote:
I'm a newbie and have spent a day or so reading up on elasticsearch and
performance. We are using logstash to parse log lines and sending to
graylog2 which writes/reads ES. "time-based indexing" is mentioned a lot on
various websites/posts but I have not found any concrete instructions.
This post has come closest, but the configuration requires the use of a
different index alias for reading and writing, and I don't think graylog2
will do that:
Here's my current guess at doing time-based indexing, for weekly indices:
create a template that will match on all index names, setting up index
config, mappings, etc.
once a week run a script that does the following:
** create a new index with a name containing the date of next Sunday (to
rotate at the start of Sunday)
** add the new index to the single alias (for example "graylog") with
the following filter:
{
"range" : {
"created_at" : {
"from" : ,
"to" : <next Sunday +1week UTC seconds>,
"include_upper": false
}
}
}
** remove the old index from the alias
** delete the old index (archive first?)
** optimize the most recently completed index
However, I don't think this will work because the filter is not applied
on indexing, only searching. (As a general solution there are potential
problems which do not occur in the graylog2 case: what if the document does
not contain "created_at"? or the filters do not cover the space, or they
overlap? A way around these is to have a default index which would be the
most recent.)
Does this work? If not what exactly is the way to configure time-based
indexing using only a single alias?
It's too bad that there isn't more explicit support for tuning ES for
logging applications (time series data), as it appears to be a common use
case and ES has excellent facilities.
I think you might find bits from this tutorial useful:
Although, I think you might be better off with the new store-level
compression than compressing source:
Also, if you want to index in time-based indices, you can do that from
logstash directly (no need to go through Graylog). And then you can use
Graylog just for searching, I guess.
And if you need a single alias for all indices, you can use "_all".
FYI - Looks like the graylog2 folks are already working on a complete
solution for time-based indexing support in a future release.
On Tuesday, December 18, 2012 11:44:51 AM UTC-8, squirrel wrote:
Yes, using 2 aliases that's the way to go, but wondering if there is a single-alias method.
Guess I'll look into how to add a "write" elasticsearch_index_name config
to graylog2.
On Tuesday, December 18, 2012 11:08:56 AM UTC-8, Igor Motov wrote:
You can simply create another alias that will always point to the latest
index, and index all records using this alias. Every Sunday, when you add a
new index and add it to the list of search aliases, you can flip this index
alias to point to the new index. It can be done in the same request.
On Monday, December 17, 2012 2:42:44 PM UTC-5, squirrel wrote:
I'm a newbie and have spent a day or so reading up on elasticsearch and
performance. We are using logstash to parse log lines and sending to
graylog2 which writes/reads ES. "time-based indexing" is mentioned a lot on
various websites/posts but I have not found any concrete instructions.
Here's my current guess at doing time-based indexing, for weekly
indices:
create a template that will match on all index names, setting up
index config, mappings, etc.
once a week run a script that does the following:
** create a new index with a name containing the date of next Sunday
(to rotate at the start of Sunday)
** add the new index to the single alias (for example "graylog") with
the following filter:
{
"range" : {
"created_at" : {
"from" : ,
"to" : <next Sunday +1week UTC seconds>,
"include_upper": false
}
}
}
** remove the old index from the alias
** delete the old index (archive first?)
** optimize the most recently completed index
However, I don't think this will work because the filter is not applied
on indexing, only searching. (As a general solution there are potential
problems which do not occur in the graylog2 case: what if the document does
not contain "created_at"? or the filters do not cover the space, or they
overlap? A way around these is to have a default index which would be the
most recent.)
Does this work? If not what exactly is the way to configure time-based
indexing using only a single alias?
It's too bad that there isn't more explicit support for tuning ES for
logging applications (time series data), as it appears to be a common use
case and ES has excellent facilities.
Also, if you want to index in time-based indices, you can do that from
logstash directly (no need to go through Graylog). And then you can use
Graylog just for searching, I guess.
And if you need a single alias for all indices, you can use "_all".
FYI - Looks like the graylog2 folks are already working on a complete
solution for time-based indexing support in a future release.
On Tuesday, December 18, 2012 11:44:51 AM UTC-8, squirrel wrote:
Yes, using 2 aliases that's the way to go, but wondering if there is a single-alias method.
Guess I'll look into how to add a "write" elasticsearch_index_name
config to graylog2.
On Tuesday, December 18, 2012 11:08:56 AM UTC-8, Igor Motov wrote:
You can simply create another alias that will always point to the
latest index, and index all records using this alias. Every Sunday, when
you add a new index and add it to the list of search aliases, you can flip
this index alias to point to the new index. It can be done in the same
request.
On Monday, December 17, 2012 2:42:44 PM UTC-5, squirrel wrote:
I'm a newbie and have spent a day or so reading up on elasticsearch
and performance. We are using logstash to parse log lines and sending to
graylog2 which writes/reads ES. "time-based indexing" is mentioned a lot on
various websites/posts but I have not found any concrete instructions.
Here's my current guess at doing time-based indexing, for weekly
indices:
create a template that will match on all index names, setting up
index config, mappings, etc.
once a week run a script that does the following:
** create a new index with a name containing the date of next Sunday
(to rotate at the start of Sunday)
** add the new index to the single alias (for example "graylog") with
the following filter:
{
"range" : {
"created_at" : {
"from" : ,
"to" : <next Sunday +1week UTC seconds>,
"include_upper": false
}
}
}
** remove the old index from the alias
** delete the old index (archive first?)
** optimize the most recently completed index
However, I don't think this will work because the filter is not
applied on indexing, only searching. (As a general solution there are
potential problems which do not occur in the graylog2 case: what if the
document does not contain "created_at"? or the filters do not cover the
space, or they overlap? A way around these is to have a default index which
would be the most recent.)
Does this work? If not what exactly is the way to configure time-based
indexing using only a single alias?
It's too bad that there isn't more explicit support for tuning ES for
logging applications (time series data), as it appears to be a common use
case and ES has excellent facilities.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.