Deleting data from Elastic


#1

Hi,

I just started using Elastic stack and my 80GB system got filled up to 100% with logs.

I have Logstash 5.1 receiving syslogs and forwarding them to Elasticsearch 5.1. It filled the space and now I'm trying to figure out how to remove some old data from Elasticsearch so I can free up some space.

Thanks,


(Aaron Mildenstein) #2

Look into Elasticsearch Curator.


#3

Thanks!

Are there any working examples of action file? I'm trying to wrap my head around Curator and find it a little advanced.

I didn't know that removing old data from Elasticsearch would be that complex))

Thanks,


(Aaron Mildenstein) #4

Well, there are examples of each action, including delete_indices.

Here's my own actions file, for my home cluster:

---
actions:
  1:
    action: delete_indices
    description: >-
      Delete indices older than 15 days (based on index name), for any Year.month.day
      indices. Ignore the error if the filter does not result in an
      actionable list of indices (ignore_empty_list) and exit cleanly.
    options:
      ignore_empty_list: True
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 15
      exclude:
  2:
    action: close
    description: >-
      Close indices older than 14 days (based on index name), for Year.month.day
      indices. Ignore the error if the filter does not result in an
      actionable list of indices (ignore_empty_list) and exit cleanly.
    options:
      ignore_empty_list: True
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 14
      exclude:
  3:
    action: delete_snapshots
    description: >-
      Delete 'daily-' prefixed snapshots from the selected repository older
      than 30 days (based on creation_date)
    options:
      ignore_empty_list: True
      repository: Untergeek
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: 'daily-'
      exclude: True
    - filtertype: age
      source: creation_date
      direction: older
      unit: days
      unit_count: 30
      exclude:
  4:
    action: delete_snapshots
    description: >-
      Delete 'monthlyinc-' prefixed snapshots from the selected repository older
      than 32 days (based on creation_date)
    options:
      ignore_empty_list: True
      repository: Untergeek
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: 'monthlyinc-'
      exclude:
    - filtertype: age
      source: creation_date
      direction: older
      unit: days
      unit_count: 32
      exclude:
  5:
    action: delete_snapshots
    description: >-
      Delete 'monthly-' prefixed snapshots from the selected repository older
      than 7 months (based on creation_date)
    options:
      ignore_empty_list: True
      repository: Untergeek
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: 'monthly-'
      exclude:
    - filtertype: age
      source: creation_date
      direction: older
      unit: months
      unit_count: 7
      exclude:
  6:
    action: forcemerge
    description: >-
      forceMerge Year.month.day indices older than 2 days (based on index
      creation_date) to 1 segment per shard.  Delay 120 seconds between each
      forceMerge operation to allow the cluster to quiesce.
      This action will ignore indices already forceMerged to the same or fewer
      number of segments per shard, so the 'forcemerged' filter is unneeded.
    options:
      ignore_empty_list: True
      max_num_segments: 1
      delay: 120
      timeout_override: 7200
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: timestring
      value: '%Y.%m.%d'
      exclude: False
    - filtertype: age
      source: creation_date
      direction: older
      unit: days
      unit_count: 1
      exclude:
  7:
    action: forcemerge
    description: >-
      forceMerge Year.month indices (excluding Year.month.day indices) older
      than 32 days (based on index creation_date) to 1 segment per shard.
      Delay 120 seconds between each forceMerge operation to allow the cluster
      to quiesce.
      This action will ignore indices already forceMerged to the same or fewer
      number of segments per shard, so the 'forcemerged' filter is unneeded.
    options:
      ignore_empty_list: True
      max_num_segments: 1
      delay: 120
      timeout_override: 7200
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: timestring
      value: '%Y.%m.%d'
      exclude: True
    - filtertype: pattern
      kind: timestring
      value: '%Y.%m'
      exclude: False
    - filtertype: age
      source: creation_date
      direction: older
      unit: days
      unit_count: 32
      exclude:
  8:
    action: snapshot
    description: >-
      Snapshot daily indices older than 1 day (based on index
      creation_date) with the default snapshot name pattern of
      'daily-%Y%m%d%H%M%S'.  Wait for the snapshot to complete.  Do not skip
      the repository filesystem access check.  Use the other options to create
      the snapshot.
    options:
      ignore_empty_list: True
      repository: Untergeek
      # Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'
      name: 'daily-%Y%m%d%H%M%S'
      ignore_unavailable: False
      include_global_state: True
      partial: False
      wait_for_completion: True
      skip_repo_fs_check: False
      timeout_override: 7200
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: timestring
      value: '%Y.%m.%d'
      exclude: False
    - filtertype: age
      source: creation_date
      direction: older
      unit: days
      unit_count: 1
      exclude:
    - filtertype: closed

(Aaron Mildenstein) #5

I actually couldn't fit the entire thing in due to size limits, but hopefully that gives you an idea.


#6

Thanks a lot!

So, taking first example, deleting indices older than 15 days. I understand that indices are in /var/lib/elasticsearch and does this location need to be specified anywhere? I do not have env var configured.

I'm trying to understand how Curator navigates through the files. Also, for the source it says name, what is it referring to?

Thanks again for your help


#7

By the way, I have x-pack installed on Elasticsearch. How do I authenticate Curator with Elasticsearch? I'm not using SSL yet, just configured for username and password.

Thanks


#8

That was a hasty question. I looked back in docs and found this option in curator.yml file

http_auth: "user:pass"

And it worked!


(Aaron Mildenstein) #9

Curator does not look at or touch Elasticsearch files. All interactions with indices and snapshots in Elasticsearch are API driven. Curator uses API calls to pull a list of all indices, and then winnows that down using the filters you provide. The action calls (delete, close, open, forcemerge, etc) are also API calls to Elasticsearch. Never directly edit or touch any of the files in /var/lib/elasticsearch. Only use API calls to perform any maintenance functions.

name refers to the index name. It is deriving the index data timestamp from a datestamp present in the index name, e.g. logstash-2017.02.13 analyzed by source: name and timestring: %Y.%m.%d will match a Year.month.day timestamp in the index name.


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.