Hello ElasticSearch Community,
My name is Colton McInroy and I work with DOSarrest Internet
Security LTD. Over the past few months I have been working with
ElasticSearch fairly closely and building a infrastructure for it. When
dealing with lots of indices, managing lots them can be somewhat
difficult in most web interfaces we found. We wanted to be able to for
instance, have indices over a certain amount of time expire out of the
cluster. We came across curator
(https://github.com/elasticsearch/curator) which came fairly close, but
had some limitations. I decided to spend a couple of days building our
own tool from scratch which after discussion we have decided to release
to the public via open source. We have called this tool Alfred, after
Bruce Wayne's butler Alfred Pennyworth, keeping in line with the Marvel
comics theme.
Alfred can be set up in a cronjob to automatically groom your
indices so that you only keep a certain amount of data, optimize
indexes, change settings (such as changing routing), and more. By
default no changes are made unless you specify the -r or --run
parameter. In its default mode, you can test this tool all you want and
get output to see what would have been done without changes actually
occurring. You can use the -D option to specify more debug output also
if you want to see what's going on (such as "-D debug"). Once you are
ready, add the -r parameter and watch Alfred do all the work for you.
Alfred was developed in Java, but does not use the ElasticSearch
Java API, rather it uses the restful api through the use of Apache
HttpClient (http://hc.apache.org/httpclient-3.x/). The following
libraries are included via maven into Alfred...
joda-time 2.3
httpcore 4.3.2
gson 2.2.4
httpclient 4.3.3
commons-logging 1.1.3
commons-codec 1.6
commons-cli 1.2
A jar build is located at
https://github.com/DOSarrest-Internet-Security/alfred/raw/master/builds/alfred-0.0.1.jar
Our Github page with source and README is located at
Here is some of that README file to explain how to use alfred...
|usage: alfred
-b,--debloom Disable Bloom on Indexes
-B,--bloom Enable Bloom on Indexes
-c,--close Close Indexes
-D,--debug Display debug (debug|info|warn|error|fatal)
-d,--delete Delete Indexes
-E,--expiresize Byte size limit (Default 10 GB)
-e,--expiretime Number of time units old (Default 24)
--examples Show some examples of how to use Alfred
-f,--flush Flush Indexes
-h,--help Help Page (Viewing Now)
--host ElasticSearch Host
-i,--index Index pattern to match (Default _all)
--max_num_segments Optimize max_num_segments (Default 2)
-o,--optimize Optimize Indexes
-O,--open Open Indexes
--port ElasticSearch Port
-r,--run Required to execute changes on
ElasticSearch
-s,--style Clean up style (time|size) (Default time)
-S,--settings PUT settings
--ssl ElasticSearch SSL
-T,--time-unit Specify time units (hour|day|none) (Default
hour)
-t,--timeout ElasticSearch Timeout (Default 30)
Alfred Version: 0.0.1|
Alfred was built as a tool to handle maintenance work on ElasticSearch.
Alfred will delete, flush cache, optimize, close/open, enable/disable
bloom filter, as well as put settings on indexes. Alfred can do any of
these actions based on either time or size parameters.
Examples:
|java -jar alfred.jar -e48 -i"cron_*" -d
|
Delete any indexes starting with "cron_" that are older that 48 hours
|java -jar alfred.jar -e24 -i"cron_*" -S'{"index.routing.allocation.require.tag":"historical"}'
|
Set routing to require historical tag on any indexes starting with
"cron_" that are older that 24 hours
|java -jar alfred.jar -e24 -i"cron_*" -b -o
|
Disable boom filter and optimize any indexes starting with "cron_" that
are older that 24 hours
|java -jar alfred.jar -ssize -E"1 GB" -d
|
Find all indxes, group by prefix, and delete indexes over a limit of 1
GB. Using the size style with an expire size does not check space based
on a single index but rather the indexes adding up over time. Such as
the following...
|java -jar alfred.jar -i"cron_*" -d -ssize -E"500 GB"
GENERAL: cron_2014_04_02_08 is 469.9 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_07 is 436.5 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_06 is 404.0 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_05 is 372.1 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_04 is 341.2 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_03 is 310.1 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_02 is 276.8 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_01 is 240.7 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_00 is 202.2 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_01_23 is 158.2 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_01_22 is 110.6 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_01_21 is 58.6 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_01_20 is 3.1 GiB bytes before the cuttoff.
GENERAL: Index cron_2014_04_01_19 would have been deleted.
GENERAL: Index cron_2014_04_01_18 would have been deleted.
GENERAL: Index cron_2014_04_01_17 would have been deleted.
GENERAL: Index cron_2014_04_01_16 would have been deleted.
GENERAL: Index cron_2014_04_01_15 would have been deleted.
GENERAL: Index cron_2014_04_01_14 would have been deleted.
GENERAL: Index cron_2014_04_01_13 would have been deleted.
GENERAL: Index cron_2014_04_01_12 would have been deleted.
GENERAL: Index cron_2014_04_01_11 would have been deleted.
GENERAL: Index cron_2014_04_01_10 would have been deleted.
GENERAL: Index cron_2014_04_01_09 would have been deleted.
GENERAL: Index cron_2014_04_01_08 would have been deleted.
GENERAL: Index cron_2014_03_29_08 would have been deleted.
|
If you are using daily indexes, such as the marvel indexes, you could
use the following examples to manage them
|java -jar alfred.jar -i".marvel-*" -d -ssize -E"500 GB"
|
Keep the past 500 GB worth of marvel indices
|java -jar alfred.jar -i".marvel-*" -d -T"day" -e7
|
Delete marvel indices older than 7 days old
|java -jar alfred.jar -i".marvel-*" -b -o -T"day" --max_num_segments=4 -e1
|
Disable bloom filter and optimize marvel indices with max_num_segments=4
over 1 day old
The following regular expression is used to split indexes into
appropriate variables...
|^((?[a-zA-Z0-9\.\-]+)(?(|-)+)(?[0-9]{4})(?(\.||-))(?[0-9]{2})(\.||-)(?[0-9]{2})(\.|_|-)?(?[0-9]{2})?)$
|
As long as your indexes following the pattern of this regular
expression, Alfred will be glad to manage your indices.
The -i parameter is passed to the URL
"http://host:port/INDEX/_stats/indices" where "INDEX" is replaced by
what ever the -i parameter contains. By default, it does _all but you
can specify all kind of wildcard options. Such as -i".marvel-",
-i"logstash-", -i"2014_04_02", etc. Alfred gave us a lot of power to
manage our indices, so we thought that the community could use him as well.
--
Thanks,
Colton McInroy
- Director of Security Engineering
Phone
(Toll Free)
US (888)-818-1344 Press 2
UK 0-800-635-0551 Press 2
My Extension 101
24/7 Support support@dosarrest.com mailto:support@dosarrest.com
Email colton@dosarrest.com mailto:colton@dosarrest.com
Website http://www.dosarrest.com
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/533BED19.4000608%40dosarrest.com.
For more options, visit https://groups.google.com/d/optout.