Time-based index writing and reading

I'm a newbie and have spent a day or so reading up on elasticsearch and
performance. We are using logstash to parse log lines and sending to
graylog2 which writes/reads ES. "time-based indexing" is mentioned a lot on
various websites/posts but I have not found any concrete instructions.

This post has come closest, but the configuration requires the use of a
different index alias for reading and writing, and I don't think graylog2
will do that:
https://groups.google.com/forum/#!searchin/elasticsearch/time$20based$20index$20alias/elasticsearch/lhoG0gE9Kvo/ez3XXIlsAx0J

Here's my current guess at doing time-based indexing, for weekly indices:

  • create a template that will match on all index names, setting up index
    config, mappings, etc.
  • once a week run a script that does the following:
    ** create a new index with a name containing the date of next Sunday (to
    rotate at the start of Sunday)
    ** add the new index to the single alias (for example "graylog") with the
    following filter:
    {
    "range" : {
    "created_at" : {
    "from" : ,
    "to" : <next Sunday +1week UTC seconds>,
    "include_upper": false
    }
    }
    }
    ** remove the old index from the alias
    ** delete the old index (archive first?)
    ** optimize the most recently completed index

However, I don't think this will work because the filter is not applied on
indexing, only searching. (As a general solution there are potential
problems which do not occur in the graylog2 case: what if the document does
not contain "created_at"? or the filters do not cover the space, or they
overlap? A way around these is to have a default index which would be the
most recent.)

Does this work? If not what exactly is the way to configure time-based
indexing using only a single alias?

It's too bad that there isn't more explicit support for tuning ES for
logging applications (time series data), as it appears to be a common use
case and ES has excellent facilities.

--

You can simply create another alias that will always point to the latest
index, and index all records using this alias. Every Sunday, when you add a
new index and add it to the list of search aliases, you can flip this index
alias to point to the new index. It can be done in the same request.

On Monday, December 17, 2012 2:42:44 PM UTC-5, squirrel wrote:

I'm a newbie and have spent a day or so reading up on elasticsearch and
performance. We are using logstash to parse log lines and sending to
graylog2 which writes/reads ES. "time-based indexing" is mentioned a lot on
various websites/posts but I have not found any concrete instructions.

This post has come closest, but the configuration requires the use of a
different index alias for reading and writing, and I don't think graylog2
will do that:

https://groups.google.com/forum/#!searchin/elasticsearch/time$20based$20index$20alias/elasticsearch/lhoG0gE9Kvo/ez3XXIlsAx0J

Here's my current guess at doing time-based indexing, for weekly indices:

  • create a template that will match on all index names, setting up index
    config, mappings, etc.
  • once a week run a script that does the following:
    ** create a new index with a name containing the date of next Sunday (to
    rotate at the start of Sunday)
    ** add the new index to the single alias (for example "graylog") with the
    following filter:
    {
    "range" : {
    "created_at" : {
    "from" : ,
    "to" : <next Sunday +1week UTC seconds>,
    "include_upper": false
    }
    }
    }
    ** remove the old index from the alias
    ** delete the old index (archive first?)
    ** optimize the most recently completed index

However, I don't think this will work because the filter is not applied on
indexing, only searching. (As a general solution there are potential
problems which do not occur in the graylog2 case: what if the document does
not contain "created_at"? or the filters do not cover the space, or they
overlap? A way around these is to have a default index which would be the
most recent.)

Does this work? If not what exactly is the way to configure time-based
indexing using only a single alias?

It's too bad that there isn't more explicit support for tuning ES for
logging applications (time series data), as it appears to be a common use
case and ES has excellent facilities.

--

Yes, using 2 aliases that's the way to go, but wondering if there is a
single-alias method.

Guess I'll look into how to add a "write" elasticsearch_index_name config
to graylog2.

On Tuesday, December 18, 2012 11:08:56 AM UTC-8, Igor Motov wrote:

You can simply create another alias that will always point to the latest
index, and index all records using this alias. Every Sunday, when you add a
new index and add it to the list of search aliases, you can flip this index
alias to point to the new index. It can be done in the same request.

On Monday, December 17, 2012 2:42:44 PM UTC-5, squirrel wrote:

I'm a newbie and have spent a day or so reading up on elasticsearch and
performance. We are using logstash to parse log lines and sending to
graylog2 which writes/reads ES. "time-based indexing" is mentioned a lot on
various websites/posts but I have not found any concrete instructions.

This post has come closest, but the configuration requires the use of a
different index alias for reading and writing, and I don't think graylog2
will do that:

https://groups.google.com/forum/#!searchin/elasticsearch/time$20based$20index$20alias/elasticsearch/lhoG0gE9Kvo/ez3XXIlsAx0J

Here's my current guess at doing time-based indexing, for weekly indices:

  • create a template that will match on all index names, setting up index
    config, mappings, etc.
  • once a week run a script that does the following:
    ** create a new index with a name containing the date of next Sunday (to
    rotate at the start of Sunday)
    ** add the new index to the single alias (for example "graylog") with the
    following filter:
    {
    "range" : {
    "created_at" : {
    "from" : ,
    "to" : <next Sunday +1week UTC seconds>,
    "include_upper": false
    }
    }
    }
    ** remove the old index from the alias
    ** delete the old index (archive first?)
    ** optimize the most recently completed index

However, I don't think this will work because the filter is not applied
on indexing, only searching. (As a general solution there are potential
problems which do not occur in the graylog2 case: what if the document does
not contain "created_at"? or the filters do not cover the space, or they
overlap? A way around these is to have a default index which would be the
most recent.)

Does this work? If not what exactly is the way to configure time-based
indexing using only a single alias?

It's too bad that there isn't more explicit support for tuning ES for
logging applications (time series data), as it appears to be a common use
case and ES has excellent facilities.

--

FYI - Looks like the graylog2 folks are already working on a complete
solution for time-based indexing support in a future release.

On Tuesday, December 18, 2012 11:44:51 AM UTC-8, squirrel wrote:

Yes, using 2 aliases that's the way to go, but wondering if there is a
single-alias method.

Guess I'll look into how to add a "write" elasticsearch_index_name config
to graylog2.

On Tuesday, December 18, 2012 11:08:56 AM UTC-8, Igor Motov wrote:

You can simply create another alias that will always point to the latest
index, and index all records using this alias. Every Sunday, when you add a
new index and add it to the list of search aliases, you can flip this index
alias to point to the new index. It can be done in the same request.

On Monday, December 17, 2012 2:42:44 PM UTC-5, squirrel wrote:

I'm a newbie and have spent a day or so reading up on elasticsearch and
performance. We are using logstash to parse log lines and sending to
graylog2 which writes/reads ES. "time-based indexing" is mentioned a lot on
various websites/posts but I have not found any concrete instructions.

This post has come closest, but the configuration requires the use of a
different index alias for reading and writing, and I don't think graylog2
will do that:

https://groups.google.com/forum/#!searchin/elasticsearch/time$20based$20index$20alias/elasticsearch/lhoG0gE9Kvo/ez3XXIlsAx0J

Here's my current guess at doing time-based indexing, for weekly indices:

  • create a template that will match on all index names, setting up index
    config, mappings, etc.
  • once a week run a script that does the following:
    ** create a new index with a name containing the date of next Sunday (to
    rotate at the start of Sunday)
    ** add the new index to the single alias (for example "graylog") with
    the following filter:
    {
    "range" : {
    "created_at" : {
    "from" : ,
    "to" : <next Sunday +1week UTC seconds>,
    "include_upper": false
    }
    }
    }
    ** remove the old index from the alias
    ** delete the old index (archive first?)
    ** optimize the most recently completed index

However, I don't think this will work because the filter is not applied
on indexing, only searching. (As a general solution there are potential
problems which do not occur in the graylog2 case: what if the document does
not contain "created_at"? or the filters do not cover the space, or they
overlap? A way around these is to have a default index which would be the
most recent.)

Does this work? If not what exactly is the way to configure time-based
indexing using only a single alias?

It's too bad that there isn't more explicit support for tuning ES for
logging applications (time series data), as it appears to be a common use
case and ES has excellent facilities.

--

Hello,

I think you might find bits from this tutorial useful:
http://www.elasticsearch.org/tutorials/2012/05/19/elasticsearch-for-logging.html

Although, I think you might be better off with the new store-level
compression than compressing source:
http://www.elasticsearch.org/guide/reference/index-modules/store.html

Also, if you want to index in time-based indices, you can do that from
logstash directly (no need to go through Graylog). And then you can use
Graylog just for searching, I guess.

And if you need a single alias for all indices, you can use "_all".

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Tue, Dec 18, 2012 at 10:44 PM, squirrel greg.denton@gmail.com wrote:

FYI - Looks like the graylog2 folks are already working on a complete
solution for time-based indexing support in a future release.

On Tuesday, December 18, 2012 11:44:51 AM UTC-8, squirrel wrote:

Yes, using 2 aliases that's the way to go, but wondering if there is a
single-alias method.

Guess I'll look into how to add a "write" elasticsearch_index_name config
to graylog2.

On Tuesday, December 18, 2012 11:08:56 AM UTC-8, Igor Motov wrote:

You can simply create another alias that will always point to the latest
index, and index all records using this alias. Every Sunday, when you add a
new index and add it to the list of search aliases, you can flip this index
alias to point to the new index. It can be done in the same request.

On Monday, December 17, 2012 2:42:44 PM UTC-5, squirrel wrote:

I'm a newbie and have spent a day or so reading up on elasticsearch and
performance. We are using logstash to parse log lines and sending to
graylog2 which writes/reads ES. "time-based indexing" is mentioned a lot on
various websites/posts but I have not found any concrete instructions.

This post has come closest, but the configuration requires the use of a
different index alias for reading and writing, and I don't think graylog2
will do that:
https://groups.google.com/forum/#!searchin/
elasticsearch/time$20based$20index$20alias/elasticsearch/
lhoG0gE9Kvo/ez3XXIlsAx0Jhttps://groups.google.com/forum/#!searchin/elasticsearch/time$20based$20index$20alias/elasticsearch/lhoG0gE9Kvo/ez3XXIlsAx0J

Here's my current guess at doing time-based indexing, for weekly
indices:

  • create a template that will match on all index names, setting up
    index config, mappings, etc.
  • once a week run a script that does the following:
    ** create a new index with a name containing the date of next Sunday
    (to rotate at the start of Sunday)
    ** add the new index to the single alias (for example "graylog") with
    the following filter:
    {
    "range" : {
    "created_at" : {
    "from" : ,
    "to" : <next Sunday +1week UTC seconds>,
    "include_upper": false
    }
    }
    }
    ** remove the old index from the alias
    ** delete the old index (archive first?)
    ** optimize the most recently completed index

However, I don't think this will work because the filter is not applied
on indexing, only searching. (As a general solution there are potential
problems which do not occur in the graylog2 case: what if the document does
not contain "created_at"? or the filters do not cover the space, or they
overlap? A way around these is to have a default index which would be the
most recent.)

Does this work? If not what exactly is the way to configure time-based
indexing using only a single alias?

It's too bad that there isn't more explicit support for tuning ES for
logging applications (time series data), as it appears to be a common use
case and ES has excellent facilities.

--

--

P.S. Forget my comment about the alias - it obviously doesn't do the
filtering that you need.

On Wed, Dec 19, 2012 at 10:30 AM, Radu Gheorghe
radu.gheorghe@sematext.comwrote:

Hello,

I think you might find bits from this tutorial useful:

http://www.elasticsearch.org/tutorials/2012/05/19/elasticsearch-for-logging.html

Although, I think you might be better off with the new store-level
compression than compressing source:
http://www.elasticsearch.org/guide/reference/index-modules/store.html

Also, if you want to index in time-based indices, you can do that from
logstash directly (no need to go through Graylog). And then you can use
Graylog just for searching, I guess.

And if you need a single alias for all indices, you can use "_all".

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Tue, Dec 18, 2012 at 10:44 PM, squirrel greg.denton@gmail.com wrote:

FYI - Looks like the graylog2 folks are already working on a complete
solution for time-based indexing support in a future release.

On Tuesday, December 18, 2012 11:44:51 AM UTC-8, squirrel wrote:

Yes, using 2 aliases that's the way to go, but wondering if there is a
single-alias method.

Guess I'll look into how to add a "write" elasticsearch_index_name
config to graylog2.

On Tuesday, December 18, 2012 11:08:56 AM UTC-8, Igor Motov wrote:

You can simply create another alias that will always point to the
latest index, and index all records using this alias. Every Sunday, when
you add a new index and add it to the list of search aliases, you can flip
this index alias to point to the new index. It can be done in the same
request.

On Monday, December 17, 2012 2:42:44 PM UTC-5, squirrel wrote:

I'm a newbie and have spent a day or so reading up on elasticsearch
and performance. We are using logstash to parse log lines and sending to
graylog2 which writes/reads ES. "time-based indexing" is mentioned a lot on
various websites/posts but I have not found any concrete instructions.

This post has come closest, but the configuration requires the use of
a different index alias for reading and writing, and I don't think graylog2
will do that:
https://groups.google.com/forum/#!searchin/
elasticsearch/time$20based$20index$20alias/elasticsearch/
lhoG0gE9Kvo/ez3XXIlsAx0Jhttps://groups.google.com/forum/#!searchin/elasticsearch/time$20based$20index$20alias/elasticsearch/lhoG0gE9Kvo/ez3XXIlsAx0J

Here's my current guess at doing time-based indexing, for weekly
indices:

  • create a template that will match on all index names, setting up
    index config, mappings, etc.
  • once a week run a script that does the following:
    ** create a new index with a name containing the date of next Sunday
    (to rotate at the start of Sunday)
    ** add the new index to the single alias (for example "graylog") with
    the following filter:
    {
    "range" : {
    "created_at" : {
    "from" : ,
    "to" : <next Sunday +1week UTC seconds>,
    "include_upper": false
    }
    }
    }
    ** remove the old index from the alias
    ** delete the old index (archive first?)
    ** optimize the most recently completed index

However, I don't think this will work because the filter is not
applied on indexing, only searching. (As a general solution there are
potential problems which do not occur in the graylog2 case: what if the
document does not contain "created_at"? or the filters do not cover the
space, or they overlap? A way around these is to have a default index which
would be the most recent.)

Does this work? If not what exactly is the way to configure time-based
indexing using only a single alias?

It's too bad that there isn't more explicit support for tuning ES for
logging applications (time series data), as it appears to be a common use
case and ES has excellent facilities.

--

--