Purge index

hello, I would like to purge my data from my indexes in elasticsearch. I'd like to know how to do this without having to delete my index. How can I achieve that?

How much do you want to purge? 1%? 70%?

I would like my data to be kept for 6 months. So I'd like to set up a life cycle. Data older than 6 months will be deleted automatically by elasticsearch. How do I go about this, knowing that I already have the index set up and a data view created from this index?

The best way is here: ILM: Manage the index lifecycle | Elasticsearch Guide [8.9] | Elastic

Can I have an exemple please?

There are a lot of examples in the documentation. For example on this page: Configure a lifecycle policy | Elasticsearch Guide [8.9] | Elastic

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_primary_shard_size": "25GB" 
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {} 
        }
      }
    }
  }
}

So if I want to apply that to your question, that'd be:

PUT _ilm/policy/delete_after_180_days
{
  "policy": {
    "phases": {
      "delete": {
        "min_age": "180d",
        "actions": {
          "delete": {} 
        }
      }
    }
  }
}

But please note that this will delete the indices older than 180 days. Which is by far the best way to do that.

What is your usecase? Timeseries data? What kind of data is that? Which tool is collecting/sending the data?

the problem is that I don't want to delete my index. I want to be able to delete documents from my index that are more than 6 months old.

That's a very bad design but I can tell you how to do it although I DO NOT RECOMMEND doing that.

The DeleteByQuery API allows that. You just need to find the right query which selects the right documents. This depends on your use case and fields you have within you documents.

But again, don't do this!

If you don't recommend it, I'd rather not do it. I'd just want elasticsearch to remove certain documents from my index at some point to avoid my index being overloaded later.

Is your data immutable or do you perform updates?

If it's a small number of documents, that's ok. If it's an entire day because documents have expired, then use Index Lifecycle Management with one index per day for example (depending on the size of your data).

Another nice solution is to use datastreams if you have append-only data. See Data streams | Elasticsearch Guide [8.9] | Elastic

So if I understand correctly I have to create a data stream corresponding to my index? Will it have an impact on my index? Because my index is fed from a logstash script. Then I created a data view from my index to be able to visualize it on kibana.
So I'd like to know if using data feeds will have any impact on my architecture.

Here is my logstash conf

input {
  file {
    path => "/scanNessus/*.csv"
    start_position => "beginning"
    #sincedb_path => "/dev/null"
  }
}

filter {
  csv {
    separator => ","
    skip_header => "true"
    skip_empty_rows => true
    columns => [
      "Plugin_ID",
      "CVE",
      "CVSS_v2.0_Base_Score",
      "Risk",
      "Host",
      "Protocol",
      "Port",
      "Name",
      "CVSS_v3.0_Base_Score",
      "CVSS_v2.0_Temporal_Score",
      "CVSS_v3.0_Temporal_Score",
      "Risk_Factor",
      "Metasploit",
      "Core_Impact",
      "CANVAS",
      "date",
      "Nom_Application",
      "Niveau_2_Contact",
      "Niveau_3_Contact"
    ]
  }
  mutate {
    add_field => { "Plugin_Host" => "%{Plugin_ID}-%{Host}" }
  }
}
output {
  elasticsearch {
    hosts =>  "localhost:9200"
    index => "nms"
    template_name => "template"
  }

  stdout {}
}

All my data are sent in my index name nms

Are you trying to get information from Welcome to Tenable Security Center (Tenable Security Center 6.2.x)?

If so, I think you should just use the integration we have:

If you are using an alias, that will be easy. If not, then you probably need to do everything from scratch. See Set up a data stream | Elasticsearch Guide [8.11] | Elastic

But as noted:

If you use Fleet or Elastic Agent, skip this tutorial. Fleet and Elastic Agent set up data streams for you. See Fleet’s data streams documentation.

Also, using ECS instead of your own schema for this use case is better IMO as it's a de-facto standard nowadays. This is what Elastic Agents are actually generating.

Here's some documentation about Logstash datastreams: Elasticsearch output plugin | Logstash Reference [8.11] | Elastic

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.