Space and data retention question

Good Afternoon and Happy New Year to all my Elastic Stack friends and Guru's.

I have been having an issue with my Elasticsearch and syslog retention. In the /var/lib/elasticsearch/log folder I have 4.3 Terrabytes of space available out of 5 TB alotted.

Over the course of the year I have had to move syslogs out of the folder because it keeps stopping the collection. When I configured my pipelines I directed the output as follows

output {
   elasticsearch {
       hosts => [ "http://ip_addy_es:9200"]
       index => "%{[@metadata][beat]}-%{[metadata][version]}-%{+YYYY.MM.DD}"
      user => "elastic"'
      password => "assigned_PW"
     }
}

output {
    elasticsearch {
       hosts => ["ip_addy_es:9200"]
       user => "elastic"
      password => "assignedPW"
      index => "syslog-%{YYYY.MM.DD}"

What I do not understand is that I have ooddles of space available but I still need to remove syslogs to allow this to continue collecting. Once I move out some syslogs it shows the new ones again.

What am I missing? Thank you for all your help and assistance.

You should really be using ILM if you can, it'll manage all of this for you.

What is oodles exactly? Cause Elasticsearch will stop accepting writes if the disk gets too full.

It also reports the details of the situation in its logs. Repeatedly. So ...

... the logs :slight_smile: If you need help interpreting the logs then share them here.

Also any actual error messages reported by Logstash. We're just guessing that it's because you're out of space. If it's not that then Logstash will be reporting more details.

Thank you for the replies.
By Oodles of space I mean on the Elasticsearch server, where the data is being stored I have 4.3 Terrabytes of space available. It is in the following directory :

slight_smile: /var/lib/elasticsearch/log/

I have not looked fully through the logs, but from what I can see is the location folder is what is causing my issue. I think I need to have my Linux team look at this.

Thank you for the quick replies, I do appreciate them.

Mike Kirbty

Other quick question regarding the size and space issue. Is there a way to use the DevTools to move the syslog to a new location. I know that I can use the DELETE command and delete them, but is there a MOVE command?

No, there is not.

Hello again ELK guru's. We increased the space on the disk and we are still not getting Syslogs into the Kibana for viewing.

When we run a df -h on the drive we see that /var/lib/logstash has 100G available but we can only write 764M. We increased the disk to have 1TB of space, restarted the Logstash Server, the services and still will not get anymore logs.

I reviewed my syslog pipeline which is as follows

input {
    beats {
         port => 5044
    }
}

Input {
   udp {
       port => 5144
       type => syslog 
      }
}

output {
    elasticsearch {
         hosts => ["https://ip_elastic_search:9200"]
         index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
         user => "elastic"
         password => "elastic_password"
      }
}

output {
     elasticsearch {
           hosts => ["ip_elastic_search:9200"]
           user => "elastic"
           password => "elastic_password"
           index => "syslog-%{YYYY.MM.dd}"
      }
#   stdout { codec => rubydebug }
}

Could I have not been precise enough in my output, shown above, for the syslogs to be placed in the proper directory and therefore we are running into a blockage and it is stopping our collection?

As always thank you for your assistance.

Same answer as before :slight_smile:

If Elasticsearch is rejecting docs because it thinks it it out of disk space then there will be evidence of that in the ES server logs. If it's rejecting docs for some other reason then it might not record why in its own logs, but it will be telling Logstash why so that should be in the Logstash logs.

David, thank you so much for you replies and continued support. It is much appreciated. Looking on the Logstash server in its log files I found the following multiple times. I did have to write it down on paper and transfer over as it is in a locked environment and not connected to an internet connection.

[logstash.ouptuts.elasticsearch] [main] [a542165b4090ae8af8360b8e56e42abc9f341d6f103e9f01bdf38ef3304b72a] could not index event to elasticsearch. {:status=>400, :action=>["index",{:_id=>nil, :_index"syslog-YYYY-MMdd", :routing=>nil}, {"host"=> "ipaddress", "type"=> "syslog", "@version"=>"1", "message"=>"<166>Date Dec 23 2022T16:49:33.621Z Firewall: %ASA-6-10600:access-first inside-production_access_in permitted udp inside-production/ipadd(port)->inside-management/ipadd(port) hit-cnt 1 first hit[0xceef7cf9, 0x00000000]\n", "@timestamp"=>2022012023T16:49:33.621Z}], :response=> {index"=>{"_index"=>"syslog-YYY-MM-dd", "type"=>"doc", "_id"=>nil, "status"=>400, "error"=>{"type"=> "validation_exception", "reason"=>"Validation_Failed" : 1: this action would add [2] shards, but this cluster currently had [1000/1000] maximum normal shards open;"}}}}

From the error I would imagine I need more shards, would it be best practice to shrink the shards that we have? How can I gain more shards?

If I change my index in the pipeline used for collecting/gathering syslogs from

index => "syslog-%{+YYYY-MM-dd}" to just  index => "syslog"

would that allow us to gather more syslogs as we would then not be using a single shard per day of syslogs"?

Thank you as always, the assistance is much appreciated.

It looks like your node has reached the maximum number of shards on the node, which means new documents can not be indexed as a new index can not be created. Given that you only have around 700GB of data in the cluster, your average shard size is very small (a lot smaller than recommended). If you want to have a reasonably long retention period I would recommend switching from daily to monthly indices, which is a minor change to the patern specified in Logstash. Once you remove some of the oldest indices through the APIs new data should then flow into the cluster.

I would also recommend you introduce an ILM policy to remove indices older than the retention period you are looking for. This will keep the number of shards in check. You can also look into using rollover or data streams, which adds flexilibity.

Christian;

Thank you for the reply, it does seem that I am looking in the correct area now, the confirmation you gave is a big help. If I change the index to

index => "syslog-%{+YYYY-MM}" 

that should change it to a monthly index, thus allowing larger amount of information to be gathered, correct?

I tried that, but as each day was/is an index it didn't work, nothing was moved from Hot to warm and then to cold. I am not at the site at the moment, but will be heading back there on Thursday, I can tell you then what I did with the ILM.

I also saw that I can run the following command, which would allow an unspecified number of indices on the solo Elasticsearch server.

PUT _cluster/settings
{
    "persistent" : {
          cluster.routing.allocation.total_shards_per_node" : null

I will look for adding rollover, which is what I really wanted to do when I stood this up.

thank you again.

Hello Christian and Elastic GURU's;

Me again. I have just tried to PUT _cluster/settings from the above note and wanted to make it a null value as I indicated. However it returned the message that I still have [1000]/[1000] maximum normal shards and it would not make the change.

I was able to find a method to "close shards" using the following":

POST syslog-2022-08-12/_close?wait_for_active_shards=1

And that brings a success.

So I then returned to my pipeline that I am using for the syslog collection and my output is very simple.

index => "syslog-%{+YYYY-MM-dd}"

However the metadata output reads as follows:

type or index => "%{[@metadata][beats]}-%{[@metadata][version]-%{+YYYY.MM.dd}"

Would the above work if I used syslog in place of @metadata? It looks like the metadata for the beats rolls over into new indices.

Thank you