Question about time based indexes/rolling indexes and eviction policies?

javadevmtl · May 23, 2014, 6:13pm

#1
I have been reading around and some people suggest if doing "log" analytics
to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it manual?

If manual

PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id

I'm pulling my data from SQL server and going to either use ETL or JDBC
gatherer. I suppose the ETL process needs to consider the date and when it
does it's index PUT to check and roll over the date so that a new index
gets created?
And my queries need to consider this also so they know that on each day
they need to search the new index?

#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace and to
either remove entries from the index or in the above case delete/archive
indexes older then a few days?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

magnusbaeck · May 26, 2014, 6:39am

On Friday, May 23, 2014 at 20:13 CEST,
John Smith java.dev.mtl@gmail.com wrote:

#1
I have been reading around and some people suggest if doing "log"
analytics to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it
manual?

I don't believe Elasticsearch itself understands date-based indices,
but Logstash does.

If manual
PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id
I'm pulling my data from SQL server and going to either use ETL or
JDBC gatherer. I suppose the ETL process needs to consider the date
and when it does it's index PUT to check and roll over the date so
that a new index gets created?

Yes.

And my queries need to consider this also so they know that on each
day they need to search the new index?

Yes, unless you use an index alias like _all to search in all indices
but that obviously has performance implication and in part voids the
benefits of multiple indices.

#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace
and to either remove entries from the index or in the above case
delete/archive indexes older then a few days?

If disk space is your limiting factor you should find the curator
script useful. You could also set the _ttl value of messages to have
them automatically expire after a set time.

--
Magnus Bäck | Software Engineer, Development Tools
magnus.back@sonymobile.com | Sony Mobile Communications

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140526063906.GB16396%40seldlx20533.corpusers.net.
For more options, visit https://groups.google.com/d/optout.

jprante · May 26, 2014, 7:58am

I will add a timeseries mode to my JDBC plugin soon. Right now you can
create timestamps with bash (or your favorite shell) and append it as a
suffix to the index name into the river/feeder creation call, but this can
be automated. No ETA yet.
This is also a nifty feature, I will experiment with the JDBC plugin if
I can estimate the data volume to index (probably from the data volume of
previous runs) or if I can make an educated guess about data growth in ES
data folders, and will refuse to continue if a limit is exceeded. Index
data volume can fluctuate due to segment creations and merging so this
would have to include an optimization strategy, or I rely on the JDBC
source.

Eviction is a harder topic, since I hesitate to create a plugin that can
delete data without user interaction. Even eviction rules in a plugin
configuration may contain mistakes and are risky. But I also see the
usefulness of obsoleting indexed data by dropping them regularly. I don't
want to take responsibility for this in the JDBC plugin, so this may just
be another plugin implementation.

Jörg

On Fri, May 23, 2014 at 8:13 PM, John Smith java.dev.mtl@gmail.com wrote:

#1
I have been reading around and some people suggest if doing "log"
analytics to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it
manual?

If manual

PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id

I'm pulling my data from SQL server and going to either use ETL or JDBC
gatherer. I suppose the ETL process needs to consider the date and when it
does it's index PUT to check and roll over the date so that a new index
gets created?
And my queries need to consider this also so they know that on each day
they need to search the new index?

#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace and
to either remove entries from the index or in the above case delete/archive
indexes older then a few days?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG-GdoHbLX0%2BCVj8jjBXQxQQjAnZzZkY90T2jnHAYT1HA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

javadevmtl · May 26, 2014, 3:28pm

Thanks!

On Monday, 26 May 2014 03:58:15 UTC-4, Jörg Prante wrote:

I will add a timeseries mode to my JDBC plugin soon. Right now you can
create timestamps with bash (or your favorite shell) and append it as a
suffix to the index name into the river/feeder creation call, but this can
be automated. No ETA yet.

This is also a nifty feature, I will experiment with the JDBC plugin if
I can estimate the data volume to index (probably from the data volume of
previous runs) or if I can make an educated guess about data growth in ES
data folders, and will refuse to continue if a limit is exceeded. Index
data volume can fluctuate due to segment creations and merging so this
would have to include an optimization strategy, or I rely on the JDBC
source.

Eviction is a harder topic, since I hesitate to create a plugin that can
delete data without user interaction. Even eviction rules in a plugin
configuration may contain mistakes and are risky. But I also see the
usefulness of obsoleting indexed data by dropping them regularly. I don't
want to take responsibility for this in the JDBC plugin, so this may just
be another plugin implementation.

Jörg

On Fri, May 23, 2014 at 8:13 PM, John Smith <java.d...@gmail.com<javascript:>

wrote:

#1
I have been reading around and some people suggest if doing "log"
analytics to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it
manual?

If manual

PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id

I'm pulling my data from SQL server and going to either use ETL or JDBC
gatherer. I suppose the ETL process needs to consider the date and when it
does it's index PUT to check and roll over the date so that a new index
gets created?
And my queries need to consider this also so they know that on each day
they need to search the new index?

#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace and
to either remove entries from the index or in the above case delete/archive
indexes older then a few days?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f8a28604-993f-44c4-8632-249cd01d29c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Time-based indices and automation in node.js Elasticsearch	9	1455	January 13, 2017
Time based elastic search index deletion Elasticsearch	2	1938	March 16, 2018
Time-based index writing and reading Elasticsearch	6	861	July 6, 2017
Indexing by time and deleting indexes by time Elasticsearch	4	372	July 6, 2017
Deleting old indices? Elasticsearch	2	391	July 6, 2017

Question about time based indexes/rolling indexes and eviction policies?

Related topics