Graylog create a new clue every day, and I’d like to send all clues older than 2 weeks to my S3 bucket (and of course I don’t want to see them in my servers anymore)
I installed the elasticsearch-curator packet
Reading the docs, I have two problems:
-I can’t find the configuration files (example: curator.yml)
-I don’t understand how to do the configuration according to my criteria (send my indices of more than 2 weeks to S3 without keeping them on my local server)
Anyone have an idea to help me? Thank you very much
No configuration files are created during install, even from RPM/DEB packages. You must do create these yourself. Example action definitions are in the docs here. The curator.yml definition example is in the docs here.
You've created a repository already, named my_s3_repository. You would create an action file with a step something like the below. Note that I snapshot after 13, and delete after 14. This gives you a chance to ensure that snapshots did occur before they are deleted after day 14.
---
actions:
1:
action: snapshot
description: >-
Snapshot myindex- prefixed indices older than 13 days (based on index
creation_date) with the default snapshot name pattern of
'curator-%Y%m%d%H%M%S'. Wait for the snapshot to complete. Do not skip
the repository filesystem access check. Use the other options to create
the snapshot.
options:
repository: my_s3_repository
# Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'
name:
ignore_unavailable: False
include_global_state: True
partial: False
wait_for_completion: True
skip_repo_fs_check: False
filters:
- filtertype: pattern
kind: prefix
value: myindex-
- filtertype: age
source: creation_date
direction: older
unit: days
unit_count: 13
2:
action: delete_indices
description: >-
Delete indices older than 14 days (based on index name), for myindex-
prefixed indices. Ignore the error if the filter does not result in an
actionable list of indices (ignore_empty_list) and exit cleanly.
options:
ignore_empty_list: True
filters:
- filtertype: pattern
kind: prefix
value: myindex-
- filtertype: age
source: creation_date
direction: older
unit: days
unit_count: 14
Thank you for your help, what you explained helps me, but I still have a few questions:
-I see in the action file the prefix "myindex" but when I go to see my indices, they have names that are totally random not starting with "myindex", isn’t that going to be a problem?
-Can I create the curator.yml and action.yml files (I don’t know if there’s another way to name it) where I want, or there are some recommended folders
-Once the two files are created, do I have to do something or will the process be automatic?
No. This behavior was changed all the way back in Elasticsearch 2.x. The index names are deliberately obfuscated in the filesystem, but the cluster state knows what the index names are and maps them accordingly. This was done to force end users to use the API to interact with indices, and never directly with the file system.
They do not have to be named curator.yml and action.yml, however if you name your configuration file curator.yml and place it in your home directory in a .curator directory, you will not need to manually specify --config and the path to the configuration in the command line. See the documentation note for more details.
You must either run Curator manually or via some scheduler like cron at the intervals you require.
I also created and configured the file /etc/elasticsearch/action.yml, the configuration is a little different from yours because I just wanted to make a text:
actions:
1:
action: snapshot
description: >-
Snapshot myindex- prefixed indices older than 13 days (based on index
creation_date) with the default snapshot name pattern of
'curator-%Y%m%d%H%M%S'. Wait for the snapshot to complete. Do not skip
the repository filesystem access check. Use the other options to create
the snapshot.
options:
repository: my_s3_repository
# Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'
name:
ignore_unavailable: False
include_global_state: True
partial: False
wait_for_completion: True
skip_repo_fs_check: False
filters:
- filtertype: pattern
kind: prefix
value: graylog*
- filtertype: age
source: creation_date
direction: older
unit: days
unit_count: 90
My graylog server has been receiving data for well over 2 months
The small problem is that in my bucket, several folders are created which is very disorganized and we can not find ourselves, the file names remain incomprehensible so we do not really see the creation date of the indice (because graylog created one indice per day). Can the conf file be improved to address this concern?
For example in my folder "indices" there are two folders with each 15 files, and unable to see its which file corresponds to which day I don’t see any "currator-YmdHMS". Thank you
You should never directly interact with the files in your data path, or the files in your S3 bucket. These file names are obfuscated on purpose. You should only manage indices via the API, and you should likewise only manage snapshots via the API.
The reason you can't make sense of the snapshot folders is because they hold shard data, which are segments. These will never be named in a way you can understand (just like the index directory you shared previously). The Elasticsearch cluster knows how to interpret the metadata files and figure out where your data needs to be, so please leave it at that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.