Use curator to send elasticsearch indices to S3

Weathmious · May 4, 2021, 3:29pm

Hello,
I just managed to link my graylog/elasticsearch server to my S3 bucket

curl -X PUT "localhost:9200/_snapshot/my_s3_repository?pretty" -H 'Content-Type: application/json' -d'

 {
   "type": "s3",
   "settings": {
     "bucket": "mysiem"
   }
 }
 '
{
  "acknowledged" : true
}

Graylog create a new clue every day, and I’d like to send all clues older than 2 weeks to my S3 bucket (and of course I don’t want to see them in my servers anymore)

I installed the elasticsearch-curator packet

Reading the docs, I have two problems:
-I can’t find the configuration files (example: curator.yml)
-I don’t understand how to do the configuration according to my criteria (send my indices of more than 2 weeks to S3 without keeping them on my local server)

Anyone have an idea to help me? Thank you very much

theuntergeek · May 4, 2021, 5:08pm

No configuration files are created during install, even from RPM/DEB packages. You must do create these yourself. Example action definitions are in the docs here. The curator.yml definition example is in the docs here.

You've created a repository already, named my_s3_repository. You would create an action file with a step something like the below. Note that I snapshot after 13, and delete after 14. This gives you a chance to ensure that snapshots did occur before they are deleted after day 14.

---
actions:
  1:
    action: snapshot
    description: >-
      Snapshot myindex- prefixed indices older than 13 days (based on index
      creation_date) with the default snapshot name pattern of
      'curator-%Y%m%d%H%M%S'.  Wait for the snapshot to complete.  Do not skip
      the repository filesystem access check.  Use the other options to create
      the snapshot.
    options:
      repository: my_s3_repository
      # Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'
      name:
      ignore_unavailable: False
      include_global_state: True
      partial: False
      wait_for_completion: True
      skip_repo_fs_check: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: myindex-
    - filtertype: age
      source: creation_date
      direction: older
      unit: days
      unit_count: 13
  2:
    action: delete_indices
    description: >-
      Delete indices older than 14 days (based on index name), for myindex-
      prefixed indices. Ignore the error if the filter does not result in an
      actionable list of indices (ignore_empty_list) and exit cleanly.
    options:
      ignore_empty_list: True
    filters:
    - filtertype: pattern
      kind: prefix
      value: myindex-
    - filtertype: age
      source: creation_date
      direction: older
      unit: days
      unit_count: 14

Weathmious · May 5, 2021, 7:22am

Thank you for your help, what you explained helps me, but I still have a few questions:

-I see in the action file the prefix "myindex" but when I go to see my indices, they have names that are totally random not starting with "myindex", isn’t that going to be a problem?

ls /mnt/graylog/elasticsearch/nodes/0/indices/
3lurA30KRT66K3_YeUBBDA  I_l4ljXmQwmzqeuDQLXrTA  ROkebCqiS2SKmtnoLd5K-g
6jsIyLtkSJuw7I6FM5aeLg  JZu86JSRSqe41AjVMOhJcA  RuKIH_8GT_282VjvsGxxCA
71Jx5NuITr-mh-XYrBghvg  kY2qgTamQ1-IsV1KfRfmLQ  WNniZ76vRZm_U0xiny7OUQ
8EsR-1jRQVWwEI9XLvKLag  lNzkHX6ATle462uQwH3fWg  XAzvAgllTnO0GAyMEDX47Q
AdixeDwmTTOs6fDIV8yYyg  lR0MpE-rS-6_aRXutCXuog  XhmTQ3n2Roaxe6JCpX6aBQ
ATP7FWk_SpSqchDm2nRIKA  M1YpCI7FQ-eYWh2qHWPJdw  xlq-jtI4Twu0kJJsc-L9PA
B2yRoFQDRje6sSZrwUZ1yg  mt3t8SLsQ_CNxdCCGgaFBw  X_qzLuCyTHSqdhqBeh-0qw
BcNUyhiiS6unxRCK3PShlA  myaa-7qVRbqjQq6tXblZzA  YchNdqu6S-2oUMygO8EEBQ
dtKqyzB0RgiOhA8IKY02FA  n47EEchMT4K1qOqysyR_yw  yJbEaRg7Qvu4pxjV3W0OiQ
dYCzPuBaR6qSAF-4PqwHuQ  OzlEyYTgRW-ESstP7uqksA  Yl5h2WeXRNW4Kpugtj1gOg
eKJgvnJcStGsqI0PGCGxcw  P6DUFVqEQo-YseozDhn15A  yp8nqvp0QnqIBbMsaj_n_Q
gMVH_r3zQKGz5cJT4M1Htw  p9jkLjGISjK22aXuj6YCvg  YvnIgCv5QPugXtrGillH8w
gQbHFxbFQe2bQ_9-CLRudg  PESgE_FwTU6kEV8rKeN5xA  YXxNwkPHRmmLz52Ths43zg
GrxLvdJ1StWHOhrYrmyZhw  PGES4n4WSyunJiUz5TcBtg  zBebioJGRZyg3aVV2nv-Kw
hD65pvCiTY27YEaRfAYO7g  Ph7nwRutQOe4e3lrm60K7Q  Ze2VXGWKReq6ReoWw8uzXA
hNmUnkrZR-WP1Ksk30sSYA  pLWbpvYrRWCLmTbuImQ6iQ  zMacyH4ARgOYXjbFRgaUsA
hXiT4VpxRni39hg2qcN9-g  QpkZaipcTNSbyyjXZk3edg

-Can I create the curator.yml and action.yml files (I don’t know if there’s another way to name it) where I want, or there are some recommended folders

-Once the two files are created, do I have to do something or will the process be automatic?

Thank you very much

theuntergeek · May 5, 2021, 1:26pm

No. This behavior was changed all the way back in Elasticsearch 2.x. The index names are deliberately obfuscated in the filesystem, but the cluster state knows what the index names are and maps them accordingly. This was done to force end users to use the API to interact with indices, and never directly with the file system.

They do not have to be named curator.yml and action.yml, however if you name your configuration file curator.yml and place it in your home directory in a .curator directory, you will not need to manually specify --config and the path to the configuration in the command line. See the documentation note for more details.

You must either run Curator manually or via some scheduler like cron at the intervals you require.

Weathmious · May 5, 2021, 1:36pm

Thank you for your answer,

Yes it works thanks, but I still have a question about that

I created the file /etc/elasticsearch/curator.yml and configured it like this:

client:
  hosts:
    - 127.0.0.1
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  username:
  password:
  timeout: 30
  master_only: False

I also created and configured the file /etc/elasticsearch/action.yml, the configuration is a little different from yours because I just wanted to make a text:

actions:
  1:
    action: snapshot
    description: >-
      Snapshot myindex- prefixed indices older than 13 days (based on index
      creation_date) with the default snapshot name pattern of
      'curator-%Y%m%d%H%M%S'.  Wait for the snapshot to complete.  Do not skip
      the repository filesystem access check.  Use the other options to create
      the snapshot.
    options:
      repository: my_s3_repository
      # Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'
      name:
      ignore_unavailable: False
      include_global_state: True
      partial: False
      wait_for_completion: True
      skip_repo_fs_check: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: graylog*
    - filtertype: age
      source: creation_date
      direction: older
      unit: days
      unit_count: 90

My graylog server has been receiving data for well over 2 months

The small problem is that in my bucket, several folders are created which is very disorganized and we can not find ourselves, the file names remain incomprehensible so we do not really see the creation date of the indice (because graylog created one indice per day). Can the conf file be improved to address this concern?

For example in my folder "indices" there are two folders with each 15 files, and unable to see its which file corresponds to which day I don’t see any "currator-YmdHMS". Thank you

theuntergeek · May 7, 2021, 3:46pm

You should never directly interact with the files in your data path, or the files in your S3 bucket. These file names are obfuscated on purpose. You should only manage indices via the API, and you should likewise only manage snapshots via the API.

The reason you can't make sense of the snapshot folders is because they hold shard data, which are segments. These will never be named in a way you can understand (just like the index directory you shared previously). The Elasticsearch cluster knows how to interpret the metadata files and figure out where your data needs to be, so please leave it at that.

system · June 4, 2021, 3:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Snapshot newly created indices Elasticsearch	3	372	January 7, 2019
Elasticsearch snapshots and storing over S3 bucket Elasticsearch	1	372	October 16, 2018
Snapshot creation using curator 4.2 Elasticsearch	16	2394	April 26, 2017
Use case for elasticsearch-curator using AWS S3 repository Elasticsearch	10	2697	July 19, 2017
Snapshot and restore Elasticsearch	2	435	July 5, 2017

Use curator to send elasticsearch indices to S3

Related topics