#1
I have been reading around and some people suggest if doing "log" analytics
to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it manual?
I'm pulling my data from SQL server and going to either use ETL or JDBC
gatherer. I suppose the ETL process needs to consider the date and when it
does it's index PUT to check and roll over the date so that a new index
gets created?
And my queries need to consider this also so they know that on each day
they need to search the new index?
#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace and to
either remove entries from the index or in the above case delete/archive
indexes older then a few days?
#1
I have been reading around and some people suggest if doing "log"
analytics to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it
manual?
I don't believe Elasticsearch itself understands date-based indices,
but Logstash does.
If manual
PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id
I'm pulling my data from SQL server and going to either use ETL or
JDBC gatherer. I suppose the ETL process needs to consider the date
and when it does it's index PUT to check and roll over the date so
that a new index gets created?
Yes.
And my queries need to consider this also so they know that on each
day they need to search the new index?
Yes, unless you use an index alias like _all to search in all indices
but that obviously has performance implication and in part voids the
benefits of multiple indices.
#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace
and to either remove entries from the index or in the above case
delete/archive indexes older then a few days?
If disk space is your limiting factor you should find the curator
script useful. You could also set the _ttl value of messages to have
them automatically expire after a set time.
--
Magnus Bäck | Software Engineer, Development Tools magnus.back@sonymobile.com | Sony Mobile Communications
I will add a timeseries mode to my JDBC plugin soon. Right now you can
create timestamps with bash (or your favorite shell) and append it as a
suffix to the index name into the river/feeder creation call, but this can
be automated. No ETA yet.
This is also a nifty feature, I will experiment with the JDBC plugin if
I can estimate the data volume to index (probably from the data volume of
previous runs) or if I can make an educated guess about data growth in ES
data folders, and will refuse to continue if a limit is exceeded. Index
data volume can fluctuate due to segment creations and merging so this
would have to include an optimization strategy, or I rely on the JDBC
source.
Eviction is a harder topic, since I hesitate to create a plugin that can
delete data without user interaction. Even eviction rules in a plugin
configuration may contain mistakes and are risky. But I also see the
usefulness of obsoleting indexed data by dropping them regularly. I don't
want to take responsibility for this in the JDBC plugin, so this may just
be another plugin implementation.
#1
I have been reading around and some people suggest if doing "log"
analytics to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it
manual?
I'm pulling my data from SQL server and going to either use ETL or JDBC
gatherer. I suppose the ETL process needs to consider the date and when it
does it's index PUT to check and roll over the date so that a new index
gets created?
And my queries need to consider this also so they know that on each day
they need to search the new index?
#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace and
to either remove entries from the index or in the above case delete/archive
indexes older then a few days?
On Monday, 26 May 2014 03:58:15 UTC-4, Jörg Prante wrote:
I will add a timeseries mode to my JDBC plugin soon. Right now you can
create timestamps with bash (or your favorite shell) and append it as a
suffix to the index name into the river/feeder creation call, but this can
be automated. No ETA yet.
This is also a nifty feature, I will experiment with the JDBC plugin if
I can estimate the data volume to index (probably from the data volume of
previous runs) or if I can make an educated guess about data growth in ES
data folders, and will refuse to continue if a limit is exceeded. Index
data volume can fluctuate due to segment creations and merging so this
would have to include an optimization strategy, or I rely on the JDBC
source.
Eviction is a harder topic, since I hesitate to create a plugin that can
delete data without user interaction. Even eviction rules in a plugin
configuration may contain mistakes and are risky. But I also see the
usefulness of obsoleting indexed data by dropping them regularly. I don't
want to take responsibility for this in the JDBC plugin, so this may just
be another plugin implementation.
Jörg
On Fri, May 23, 2014 at 8:13 PM, John Smith <java.d...@gmail.com<javascript:>
wrote:
#1
I have been reading around and some people suggest if doing "log"
analytics to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it
manual?
I'm pulling my data from SQL server and going to either use ETL or JDBC
gatherer. I suppose the ETL process needs to consider the date and when it
does it's index PUT to check and roll over the date so that a new index
gets created?
And my queries need to consider this also so they know that on each day
they need to search the new index?
#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace and
to either remove entries from the index or in the above case delete/archive
indexes older then a few days?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.