Index cleanup

Hi,

I've got Serilog pumping logs in to elastic search from various applications, using the Serilog ElasticSearch Sink. I use a daily indexing pattern per environment and serilog manages the initial creation of these indexes. e.g.

logs-development-2020.01.17-00001

We are using elastic cloud, so I've tried creating an index template and rollup policy, with an alias as per the docs and basically made sure they were the same as the apm ones that are there as standard. But I get policy errors which I assume are because serilog is creating the index and not letting ilm do its thing.

I tried flipping it so serilog didn't create index and published to the alias, but serilog wouldn't write to the alias, possibly as it generates the index template when serilog creates the index itself.

Regardless of the above, the rollup policy still didn't apply probably as a result of serilog not being able to write to the index and their being nothing in it.

So I'm a little stumped, I can't figure out how best to approach this. I saw something about creating a cron that deletes the index using a pattern after x days or something, but there was little information about how to do that.

I also keep seeing references to curator, but I've not used that before and I don't know if it's available in elastic cloud.

Any input would be greatly appreciated

Andy

Hmmm. There may be a way to force an additional index template to associate these indices with an ILM policy. The template would only have to include the ILM policy information, and match the same index pattern as serilog-*, but have a higher order than the one in the template serilog is uploading. You see, Elasticsearch merges all matching index templates and overwrites with values from higher order templates.

Barring that, Curator does work with Cloud instances. The downside is that you'd have to run it somewhere else that has access to your Cloud instance (HTTPS access is all that would be needed).

Hi

Thanks for the quick reply, sorry I didn't describe things very well, the serilog generated indexes don't have a "template" like I was making when I tried to copy the apm rollup policy implementation. What I meant to say is the mappings and some settings etc are set directly on the index and when I tried to create the index manually with a template and rollup policy, add an alias manually and then use the alias as the index for serilog to write to. These mappings and settings weren't there and perhaps that's why serilog wouldn't write to the alias.

Here is the index settings when serilog generates the index:

{
  "settings": {
    "index": {
      "creation_date": "1579219202955",
      "number_of_shards": "1",
      "number_of_replicas": "1",
      "uuid": "XyUO1zb-RHaUCnytqOrn6w",
      "version": {
        "created": "7050199"
      },
      "provided_name": "logstash-development-2020.01.17-1"
    }
  },
  "defaults": {
    "index": {
      "flush_after_merge": "512mb",
      "final_pipeline": "_none",
      "max_inner_result_window": "100",
      "unassigned": {
        "node_left": {
          "delayed_timeout": "1m"
        }
      },
      "max_terms_count": "65536",
      "lifecycle": {
        "name": "",
        "parse_origination_date": "false",
        "indexing_complete": "false",
        "rollover_alias": "",
        "origination_date": "-1"
      },
      "routing_partition_size": "1",
      "force_memory_term_dictionary": "false",
      "max_docvalue_fields_search": "100",
      "merge": {
        "scheduler": {
          "max_thread_count": "1",
          "auto_throttle": "true",
          "max_merge_count": "6"
        },
        "policy": {
          "reclaim_deletes_weight": "2.0",
          "floor_segment": "2mb",
          "max_merge_at_once_explicit": "30",
          "max_merge_at_once": "10",
          "max_merged_segment": "5gb",
          "expunge_deletes_allowed": "10.0",
          "segments_per_tier": "10.0",
          "deletes_pct_allowed": "33.0"
        }
      },
      "max_refresh_listeners": "1000",
      "max_regex_length": "1000",
      "load_fixed_bitset_filters_eagerly": "true",
      "number_of_routing_shards": "1",
      "write": {
        "wait_for_active_shards": "1"
      },
      "verified_before_close": "false",
      "mapping": {
        "coerce": "false",
        "nested_fields": {
          "limit": "50"
        },
        "depth": {
          "limit": "20"
        },
        "field_name_length": {
          "limit": "9223372036854775807"
        },
        "total_fields": {
          "limit": "1000"
        },
        "nested_objects": {
          "limit": "10000"
        },
        "ignore_malformed": "false"
      },
      "source_only": "false",
      "soft_deletes": {
        "enabled": "false",
        "retention": {
          "operations": "0"
        },
        "retention_lease": {
          "period": "12h"
        }
      },
      "max_script_fields": "32",
      "query": {
        "default_field": [
          "*"
        ],
        "parse": {
          "allow_unmapped_fields": "true"
        }
      },
      "format": "0",
      "frozen": "false",
      "sort": {
        "missing": [],
        "mode": [],
        "field": [],
        "order": []
      },
      "priority": "1",
      "codec": "default",
      "max_rescore_window": "10000",
      "max_adjacency_matrix_filters": "100",
      "analyze": {
        "max_token_count": "10000"
      },
      "gc_deletes": "60s",
      "optimize_auto_generated_id": "true",
      "max_ngram_diff": "1",
      "translog": {
        "generation_threshold_size": "64mb",
        "flush_threshold_size": "512mb",
        "sync_interval": "5s",
        "retention": {
          "size": "512MB",
          "age": "12h"
        },
        "durability": "REQUEST"
      },
      "auto_expand_replicas": "false",
      "mapper": {
        "dynamic": "true"
      },
      "requests": {
        "cache": {
          "enable": "true"
        }
      },
      "data_path": "",
      "highlight": {
        "max_analyzed_offset": "1000000"
      },
      "routing": {
        "rebalance": {
          "enable": "all"
        },
        "allocation": {
          "enable": "all",
          "total_shards_per_node": "-1"
        }
      },
      "search": {
        "slowlog": {
          "level": "TRACE",
          "threshold": {
            "fetch": {
              "warn": "-1",
              "trace": "-1",
              "debug": "-1",
              "info": "-1"
            },
            "query": {
              "warn": "-1",
              "trace": "-1",
              "debug": "-1",
              "info": "-1"
            }
          }
        },
        "idle": {
          "after": "30s"
        },
        "throttled": "false"
      },
      "fielddata": {
        "cache": "node"
      },
      "default_pipeline": "_none",
      "max_slices_per_scroll": "1024",
      "shard": {
        "check_on_startup": "false"
      },
      "xpack": {
        "watcher": {
          "template": {
            "version": ""
          }
        },
        "version": "",
        "ccr": {
          "following_index": "false"
        }
      },
      "percolator": {
        "map_unmapped_fields_as_text": "false"
      },
      "allocation": {
        "max_retries": "5"
      },
      "refresh_interval": "1s",
      "indexing": {
        "slowlog": {
          "reformat": "true",
          "threshold": {
            "index": {
              "warn": "-1",
              "trace": "-1",
              "debug": "-1",
              "info": "-1"
            }
          },
          "source": "1000",
          "level": "TRACE"
        }
      },
      "compound_format": "0.1",
      "blocks": {
        "metadata": "false",
        "read": "false",
        "read_only_allow_delete": "false",
        "read_only": "false",
        "write": "false"
      },
      "max_result_window": "10000",
      "store": {
        "stats_refresh_interval": "10s",
        "type": "",
        "fs": {
          "fs_lock": "native"
        },
        "preload": []
      },
      "queries": {
        "cache": {
          "enabled": "true"
        }
      },
      "warmer": {
        "enabled": "true"
      },
      "max_shingle_diff": "3",
      "query_string": {
        "lenient": "false"
      }
    }
  }
}

Ah. Well, the same still applies. You can create an index template in Elasticsearch which does nothing other than match the index naming pattern coming from serilog and applies the ILM policy information. Even if Serilog is creating indices with settings directly, the template should apply.

Hi

So I've created a rollup policy like this that will delete the index after 7days:

PUT _ilm/policy/logstash-rollover-7days
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "7d",
            "max_size": "15gb"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "delete": {
        "min_age": "7d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Then I've created a template like this:

PUT _template/logstash-cleanup-development
{
  "order": 2,
  "index_patterns": [
    "logstash-development-*"
  ],
  "settings": {
    "index": {
      "lifecycle": {
        "name": "logstash-rollover-7days",
        "rollover_alias": "logstash-development"
      },
      "codec": "best_compression",
      "mapping": {
        "total_fields": {
          "limit": "2000"
        }
      },
      "refresh_interval": "5s",
      "number_of_shards": "1",
      "auto_expand_replicas": "0-1",
      "number_of_routing_shards": "30",
      "number_of_replicas": "1"
    }
  }
}

And finally I've manually added the alias to the latest index as in today's date:

POST /_aliases
{
  "actions" : [
  {
      "add" : {
           "index" : "logstash-development-2020.01.20-1",
           "alias" : "logstash-development",
           "is_write_index" : true
      }
  }]
}

Hi @theuntergeek

With the above applied I get the following error the next day as the new index is created by serilog:

illegal_argument_exception: index.lifecycle.rollover_alias [logstash-development] does not point to index [logstash-development-2020.01.21-1]

So it appears like the policy is not moving the alias?

Any ideas?

Cheers

Andy

Sorry, was traveling yesterday. Is logstash-development an alias? Is your serilog pointing to an alias, or indices?

ILM does not require indices to be part of an alias. Rather than add that to a template, what you would need would just be this part in the template:

"index": {
      "lifecycle": {
        "name": "logstash-rollover-7days"
      },

And you can also apply the same settings to the current index as well, with the _settings endpoint.

logstash-development is an alias, I only created that as the documentation suggested it. Serilog wouldn't write to the alias for some reason (perhaps field mapping related I'm not sure). So serilog is writing to the index itself.

So are you saying, just remove the alias bits? I literally ran the commands I detailed above, and Serilog creates the index itself each day.

That is correct. If Serilog is not writing to an alias, then adding the alias bits to the template/ILM policy will only result in Elasticsearch errors.

Cool thanks, I'll post back in the morning and let you know if worked :slight_smile:

Hi @theuntergeek

I'm getting this error now:

illegal_argument_exception: setting [index.lifecycle.rollover_alias] for index [logstash-test-2020.01.21-1] is empty or not defined

:confused:

What created that index? Is there a template that you pushed that still is associated with an alias? Because that index name with the -1 appended seems to imply an alias is in use.

You can still apply an ILM policy to an index, without it needing to be associated with an alias.

It may also be that logstash-test-2020.01.21-1 needs to have its _settings updated to remove reference to an alias.

All indexes are created by serilog, the pattern is essentially "logstash-{environment}-{0:yyyy.MM.dd}-1". The reason for the "-1" at the end, is that elasticsearch was kicking off that it didn't meet a regex pattern basically requiring an integer at the end. I guess much like apm indexes ends in 00001.

So that is there only to satisfy elasticsearch and is following the template for the index name I have given to serilog.

So I removed the alias from the templates as you suggested, then deleted the manually assigned aliases off the indexes and finally I deleted today's indexes to ensure the old template wasn't there, and let serilog recreate the indexes, which will have been matched by the new template (which they were). I then copied the settings from the new index from today, and updated the rest of the indexes (as I want to see the ILM policy delete the old indexes in a day or so).

So in a nutshell I've got the new settings applied to all indexes, and I got today's index to be recreated using the new template.

Here are the new settings from today's index after it was recreated with the new template (containing no alias).

{
  "index.blocks.read_only_allow_delete": "false",
  "index.query.default_field": [
    "*"
  ],
  "index.write.wait_for_active_shards": "1",
  "index.lifecycle.name": "logstash-rollover-7days",
  "index.mapping.total_fields.limit": "2000",
  "index.refresh_interval": "5s",
  "index.auto_expand_replicas": "0-1",
  "index.priority": "100",
  "index.number_of_replicas": "1"
}

@theuntergeek I've tried a few other things, but still no joy, it seems like the documentation suggests you need a rollup alias?

Like in this post, however I've already tried that suggestion and that doesn't work either:
I just want a simple ILM rollover policy

Sorry for the delayed response. Let's not confuse rollup and rollover, which are different from one another.

Can you manually configure Serilog's output index naming?

Hi @theuntergeek

Yes you can set the index naming as you like, I use a template as follows to ensure it pushes out a new index for each day:

            logger.WriteTo.Elasticsearch(new ElasticsearchSinkOptions(new Uri(config.Url))
            {
                ModifyConnectionSettings = x => x.GlobalHeaders(new NameValueCollection { { "Authorization", $"ApiKey {apiKey}" } }),
                AutoRegisterTemplate = true,
                IndexFormat = $"logstash-{environment}" + "{0:yyyy.MM.dd}-1"
            });

So, why not hard-code an elasticsearch alias as the index name? Then ILM rollover will handle it for you.

Of course, you'd need to create an index mapping template first, then create the initial index & alias. The mapping template would contain the ILM information as described.

@theuntergeek I already tried that and it wouldn't write to the index at all when I used the alias.

Hmmm. That may be because it is trying to create the index. Is there a setting that allows you to tell Serilog to not attempt to create an index?