Indexing by time and deleting indexes by time


(Alexis Okuwa) #1

So there are a few questions that i have about indexing data based by time.
Just think of this like logstash lots of incomming data may get read once,
a lot or never but i am able to aggerate my data set.

When it comes to using aliases is there any performance gain or caching for
the aliases for example i could just use
http://loclahost:9200/index1,index2,index3,index4/logs/_search i know there
are some routing and filters that can be applied to an aliases but just
wanted to know if its cached or something extra other then management for
queries not built by code?

If i plan on deleting raw data older then 30 days is it better to have 1
index for a month keep it around for an extra month and drop the whole
index, or something closer to daily or weekly indexes. This is assuming
that i have the right number of shards already figured out?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #2

Hi Alexis,

Yes, filters that are applied for aliases will be cached.

Regarding discarding old data, both options you are suggesting are good
options. The decision may depend on whether you can afford the extra
storage to keep around two monthly indices or not. Just do what looks the
most consistent with the rest of your log management infrastructure. I'm
not sure to get the question about numbers of shards, but something nice
with rolling indices is that since they have a short life expectancy, you
can adapt the number of shards for future indices if the amount of data to
handle is higher/lower than expected.

On Wed, Sep 4, 2013 at 2:33 PM, Alexis Okuwa wojonstech@gmail.com wrote:

So there are a few questions that i have about indexing data based by
time. Just think of this like logstash lots of incomming data may get read
once, a lot or never but i am able to aggerate my data set.

When it comes to using aliases is there any performance gain or caching
for the aliases for example i could just use
http://loclahost:9200/index1,index2,index3,index4/logs/_search i know
there are some routing and filters that can be applied to an aliases but
just wanted to know if its cached or something extra other then management
for queries not built by code?

If i plan on deleting raw data older then 30 days is it better to have 1
index for a month keep it around for an extra month and drop the whole
index, or something closer to daily or weekly indexes. This is assuming
that i have the right number of shards already figured out?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexis Okuwa) #3

Adrien,

Thanks for the reply. To coniue with the aliases, if i am not putting
filters on each of them but just adding indexes to be part of them is there
any perfomance incresses over just setting them in the uri.

Also wanted to know if there was any throttling that can be set for when
deleting the index. If i had to delete 30 days of information at once and
it was lets say 100gb of data That could take some period of time to delete
and would effect perfomance for a few minutes if not more. While deleting a
single day of info maybe 3gb would only take a few seconds?

On Wed, Sep 4, 2013 at 6:57 AM, Adrien Grand <adrien.grand@elasticsearch.com

wrote:

Hi Alexis,

Yes, filters that are applied for aliases will be cached.

Regarding discarding old data, both options you are suggesting are good
options. The decision may depend on whether you can afford the extra
storage to keep around two monthly indices or not. Just do what looks the
most consistent with the rest of your log management infrastructure. I'm
not sure to get the question about numbers of shards, but something nice
with rolling indices is that since they have a short life expectancy, you
can adapt the number of shards for future indices if the amount of data to
handle is higher/lower than expected.

On Wed, Sep 4, 2013 at 2:33 PM, Alexis Okuwa wojonstech@gmail.com wrote:

So there are a few questions that i have about indexing data based by
time. Just think of this like logstash lots of incomming data may get read
once, a lot or never but i am able to aggerate my data set.

When it comes to using aliases is there any performance gain or caching
for the aliases for example i could just use
http://loclahost:9200/index1,index2,index3,index4/logs/_search i know
there are some routing and filters that can be applied to an aliases but
just wanted to know if its cached or something extra other then management
for queries not built by code?

If i plan on deleting raw data older then 30 days is it better to have 1
index for a month keep it around for an extra month and drop the whole
index, or something closer to daily or weekly indexes. This is assuming
that i have the right number of shards already figured out?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/zxMayERqtjc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Enjoy,
Alexis Okuwa
WojonsTech
424.835.1223

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #4

Hi,

On Wed, Sep 4, 2013 at 10:35 PM, Alexis Okuwa wojonstech@gmail.com wrote:

Adrien,

Thanks for the reply. To coniue with the aliases, if i am not putting
filters on each of them but just adding indexes to be part of them is there
any perfomance incresses over just setting them in the uri.

No, this should be fully equivalent.

Also wanted to know if there was any throttling that can be set for when
deleting the index. If i had to delete 30 days of information at once and
it was lets say 100gb of data That could take some period of time to delete
and would effect perfomance for a few minutes if not more. While deleting a
single day of info maybe 3gb would only take a few seconds?

If this delete pattern is common for your use-case, it would make sense to
have rolling indices and to delete whole indices instead of partitions of
an existing index, this is much more lightweight.

Otherwise, even if documents you are deleting are 100GB in total, in
practice Elasticsearch will only turn off a few bits off on disk and the
documents which are marked as deleted will just be ignored when a
background merge kicks in and merges the segment containing the deleted
documents with other segments.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5