Question about time based indexes/rolling indexes and eviction policies?


(None) #1

#1
I have been reading around and some people suggest if doing "log" analytics
to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it manual?

If manual

PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id

I'm pulling my data from SQL server and going to either use ETL or JDBC
gatherer. I suppose the ETL process needs to consider the date and when it
does it's index PUT to check and roll over the date so that a new index
gets created?
And my queries need to consider this also so they know that on each day
they need to search the new index?

#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace and to
either remove entries from the index or in the above case delete/archive
indexes older then a few days?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Magnus Bäck) #2

On Friday, May 23, 2014 at 20:13 CEST,
John Smith java.dev.mtl@gmail.com wrote:

#1
I have been reading around and some people suggest if doing "log"
analytics to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it
manual?

I don't believe Elasticsearch itself understands date-based indices,
but Logstash does.

If manual
PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id
I'm pulling my data from SQL server and going to either use ETL or
JDBC gatherer. I suppose the ETL process needs to consider the date
and when it does it's index PUT to check and roll over the date so
that a new index gets created?

Yes.

And my queries need to consider this also so they know that on each
day they need to search the new index?

Yes, unless you use an index alias like _all to search in all indices
but that obviously has performance implication and in part voids the
benefits of multiple indices.

#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace
and to either remove entries from the index or in the above case
delete/archive indexes older then a few days?

If disk space is your limiting factor you should find the curator
script useful. You could also set the _ttl value of messages to have
them automatically expire after a set time.


http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ttl-field.html

--
Magnus Bäck | Software Engineer, Development Tools
magnus.back@sonymobile.com | Sony Mobile Communications

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140526063906.GB16396%40seldlx20533.corpusers.net.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #3
  1. I will add a timeseries mode to my JDBC plugin soon. Right now you can
    create timestamps with bash (or your favorite shell) and append it as a
    suffix to the index name into the river/feeder creation call, but this can
    be automated. No ETA yet.

  2. This is also a nifty feature, I will experiment with the JDBC plugin if
    I can estimate the data volume to index (probably from the data volume of
    previous runs) or if I can make an educated guess about data growth in ES
    data folders, and will refuse to continue if a limit is exceeded. Index
    data volume can fluctuate due to segment creations and merging so this
    would have to include an optimization strategy, or I rely on the JDBC
    source.

Eviction is a harder topic, since I hesitate to create a plugin that can
delete data without user interaction. Even eviction rules in a plugin
configuration may contain mistakes and are risky. But I also see the
usefulness of obsoleting indexed data by dropping them regularly. I don't
want to take responsibility for this in the JDBC plugin, so this may just
be another plugin implementation.

Jörg

On Fri, May 23, 2014 at 8:13 PM, John Smith java.dev.mtl@gmail.com wrote:

#1
I have been reading around and some people suggest if doing "log"
analytics to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it
manual?

If manual

PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id

I'm pulling my data from SQL server and going to either use ETL or JDBC
gatherer. I suppose the ETL process needs to consider the date and when it
does it's index PUT to check and roll over the date so that a new index
gets created?
And my queries need to consider this also so they know that on each day
they need to search the new index?

#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace and
to either remove entries from the index or in the above case delete/archive
indexes older then a few days?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG-GdoHbLX0%2BCVj8jjBXQxQQjAnZzZkY90T2jnHAYT1HA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(None) #4

Thanks!

On Monday, 26 May 2014 03:58:15 UTC-4, Jörg Prante wrote:

  1. I will add a timeseries mode to my JDBC plugin soon. Right now you can
    create timestamps with bash (or your favorite shell) and append it as a
    suffix to the index name into the river/feeder creation call, but this can
    be automated. No ETA yet.

  2. This is also a nifty feature, I will experiment with the JDBC plugin if
    I can estimate the data volume to index (probably from the data volume of
    previous runs) or if I can make an educated guess about data growth in ES
    data folders, and will refuse to continue if a limit is exceeded. Index
    data volume can fluctuate due to segment creations and merging so this
    would have to include an optimization strategy, or I rely on the JDBC
    source.

Eviction is a harder topic, since I hesitate to create a plugin that can
delete data without user interaction. Even eviction rules in a plugin
configuration may contain mistakes and are risky. But I also see the
usefulness of obsoleting indexed data by dropping them regularly. I don't
want to take responsibility for this in the JDBC plugin, so this may just
be another plugin implementation.

Jörg

On Fri, May 23, 2014 at 8:13 PM, John Smith <java.d...@gmail.com<javascript:>

wrote:

#1
I have been reading around and some people suggest if doing "log"
analytics to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it
manual?

If manual

PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id

I'm pulling my data from SQL server and going to either use ETL or JDBC
gatherer. I suppose the ETL process needs to consider the date and when it
does it's index PUT to check and roll over the date so that a new index
gets created?
And my queries need to consider this also so they know that on each day
they need to search the new index?

#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace and
to either remove entries from the index or in the above case delete/archive
indexes older then a few days?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f8a28604-993f-44c4-8632-249cd01d29c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5