So there are a few questions that i have about indexing data based by time.
Just think of this like logstash lots of incomming data may get read once,
a lot or never but i am able to aggerate my data set.
When it comes to using aliases is there any performance gain or caching for
the aliases for example i could just use http://loclahost:9200/index1,index2,index3,index4/logs/_search i know there
are some routing and filters that can be applied to an aliases but just
wanted to know if its cached or something extra other then management for
queries not built by code?
If i plan on deleting raw data older then 30 days is it better to have 1
index for a month keep it around for an extra month and drop the whole
index, or something closer to daily or weekly indexes. This is assuming
that i have the right number of shards already figured out?
Yes, filters that are applied for aliases will be cached.
Regarding discarding old data, both options you are suggesting are good
options. The decision may depend on whether you can afford the extra
storage to keep around two monthly indices or not. Just do what looks the
most consistent with the rest of your log management infrastructure. I'm
not sure to get the question about numbers of shards, but something nice
with rolling indices is that since they have a short life expectancy, you
can adapt the number of shards for future indices if the amount of data to
handle is higher/lower than expected.
So there are a few questions that i have about indexing data based by
time. Just think of this like logstash lots of incomming data may get read
once, a lot or never but i am able to aggerate my data set.
When it comes to using aliases is there any performance gain or caching
for the aliases for example i could just use http://loclahost:9200/index1,index2,index3,index4/logs/_search i know
there are some routing and filters that can be applied to an aliases but
just wanted to know if its cached or something extra other then management
for queries not built by code?
If i plan on deleting raw data older then 30 days is it better to have 1
index for a month keep it around for an extra month and drop the whole
index, or something closer to daily or weekly indexes. This is assuming
that i have the right number of shards already figured out?
Thanks for the reply. To coniue with the aliases, if i am not putting
filters on each of them but just adding indexes to be part of them is there
any perfomance incresses over just setting them in the uri.
Also wanted to know if there was any throttling that can be set for when
deleting the index. If i had to delete 30 days of information at once and
it was lets say 100gb of data That could take some period of time to delete
and would effect perfomance for a few minutes if not more. While deleting a
single day of info maybe 3gb would only take a few seconds?
Yes, filters that are applied for aliases will be cached.
Regarding discarding old data, both options you are suggesting are good
options. The decision may depend on whether you can afford the extra
storage to keep around two monthly indices or not. Just do what looks the
most consistent with the rest of your log management infrastructure. I'm
not sure to get the question about numbers of shards, but something nice
with rolling indices is that since they have a short life expectancy, you
can adapt the number of shards for future indices if the amount of data to
handle is higher/lower than expected.
So there are a few questions that i have about indexing data based by
time. Just think of this like logstash lots of incomming data may get read
once, a lot or never but i am able to aggerate my data set.
When it comes to using aliases is there any performance gain or caching
for the aliases for example i could just use http://loclahost:9200/index1,index2,index3,index4/logs/_search i know
there are some routing and filters that can be applied to an aliases but
just wanted to know if its cached or something extra other then management
for queries not built by code?
If i plan on deleting raw data older then 30 days is it better to have 1
index for a month keep it around for an extra month and drop the whole
index, or something closer to daily or weekly indexes. This is assuming
that i have the right number of shards already figured out?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
Thanks for the reply. To coniue with the aliases, if i am not putting
filters on each of them but just adding indexes to be part of them is there
any perfomance incresses over just setting them in the uri.
No, this should be fully equivalent.
Also wanted to know if there was any throttling that can be set for when
deleting the index. If i had to delete 30 days of information at once and
it was lets say 100gb of data That could take some period of time to delete
and would effect perfomance for a few minutes if not more. While deleting a
single day of info maybe 3gb would only take a few seconds?
If this delete pattern is common for your use-case, it would make sense to
have rolling indices and to delete whole indices instead of partitions of
an existing index, this is much more lightweight.
Otherwise, even if documents you are deleting are 100GB in total, in
practice Elasticsearch will only turn off a few bits off on disk and the
documents which are marked as deleted will just be ignored when a
background merge kicks in and merges the segment containing the deleted
documents with other segments.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.