Logstash pipeline index question

AS many of you know and have been following, my syslog collectors keep stopping due to running out of shards. I have made some improvements and they now run for about 3 weeks before I have to "close" the index. Better than deleting them for sure. However I would rather be able to have them roll over on their own and then I should be able to create data history policies for the syslogs.

To that end I have been reviewing some of the patterns in the current Logstash pipelines and as i do not want to break my live environment I thought I would ask the question first. If I change my syslog index to the following:

"%{;syslog-YYYY]}-%{[@metadata][version]}-%{+YYYY.MM.dd}

which closely resembles a beat index. Would that be more effective than what I am currently using and allow the index to continue without stopping. Current index creation in the Logstash pipeline for syslogs is as follows:

"syslog-%{+YYYY.MM}"

Thank you for all your help and assistance. It is always much appreciated.

M.Kirby

This begs for ILM.

How big does your index get? If you won't use ILM, then adjust your index pattern to try for same sizes ILM uses, max 50G or 30 days.

The problem with using date fields in the index name is that data with old or invalid dates will be written to an index with that date, so the data ingested now may go to several indices. Same with multiple versions, if you have 10 different metadata versions running, 10 indices.

Agree with @rugenl.
Can you expound upon what you mean by:

I think understanding that better might help you get the best answer. Also, what version of Elastic are your running?

The index is pretty small, not much over a megabit of data. I keep having to close the indices after a few weeks to continue the flow of information.

Do you have an example of how to adjust the pattern to have larger sizes? Also when I tried the ILM it doesn't allow me to roll over. Maybe the way I am using it. Any assistance with that is much appreciated as well. I will keep researching the proper use of ILM.

thank you Len for the reply it is appreciated.

M.Kirby

I have a maximum of 1000 shards and it reaches that maximum then stops collecting until I either close old shards or delete them. I have tried to use the following in the Dev Tools to correct the issue.

Put /cluster/settings/_settings
{
 "persistent": {
     "elasticsearch.routing.allocation.total_shards_per_node": null
    }
}

And it has not made it to be an unlimited amount of shards as I was hoping. Unless I need to close all the current shards and then run the command.

thank you for the reply Andrew, it is appreciated.

M.Kirby

Gotcha, that helps. Reading between the lines you have a 1 node cluster, right? Generally assigning a setting with a value of null will cause elastic to revert to the default value. You can run this to see all values currently in play:

GET _cluster/settings?include_defaults=true

That is more as an aside for your information. I wouldn't recommend going above the limit of 1000, because it is there for a reason.

Right sizing the shards is the way to go. Having said that do you know that all 1000 indices are needed to be open? There might be some cleanup that would give you some headroom there as well. I'm still a little confused about your current index that you have to close because it seems like it should be rolling over each month. Seems like that would just be 12 indices/year at a few megabytes a piece shouldn't be a problem. Hence why I ask about the other indices.

You are correct Andrew, it is a node cluster. I am unable to get my shard count to roll over correctly, they are all yellow or red and not green when i look at it, in Kibana. When I originally set this up I had the indices set to a Daily count.

"syslog-%{+YYYY.MM.dd}"

It was this year, at the start of January where I removed the ".dd" portion so that it would be a monthly index. I have closed about 5 days of indices and that got us collecting and viewing the syslogs again. I did try to create an ILM but as each index was daily it didn't work as intended if at all. It wouldn't close anything.

I am not going to be back at the site where the ELK SIEM is running until next week. I can get more information then and post back up unless there is anything you can think of that can help at this time.

Regards and thank you for you continued support.

M.Kirby

I think you should change the way you think about this. Daily indexes were useful for a retention policy like "I want 90 days of web server logs". The indexes would rollover daily and you could use a program like curator (or nowadays ILM) to delete any older than 90 days. In kibana you would use an index pattern to merge all 90 logs into a single searchable index.

Using daily indexes across a large number of index patterns will run you out of shards very quickly. If you have three versions of four different beats you will be using more than 10 indexes a day. If for some reason you have multiple shards per index (not the default since 7.x) then you could be using 25 indexes a day, meaning you run out after little more than a month.

Using the cat shards API should help you figure out what is happening. I understand you are in an environment where you cannot copy the output to show us.

With ILM you can implement a retention policy like "I want to keep the last 50 GB of syslog data". If you get 100 GB of syslog per day that will be 12 hours, if you get 10 MB per day that will be more than 10 years worth. You would configure logstash to write all the syslog data to an index called "syslog", then configure ILM to roll that over every 20 GB, and keep the last two rolled over indexes.

Similarly for the beats data. Instead of daily indexes for each beat you could try a single index called "beats". If that results in too many fields then you could use "%{[@metadata][beat]}". If that results in mapping conflicts because different versions of a beat use different field structures then try "beats-%{[metadata][version]}". You might even need "%{[@metadata][beat]}-%{[metadata][version]}", but take the date out of the index name and use ILM to determine the amount of disk space each index can use.

Gotcha. Ok, obviously red indices are a problem, but I'm going to put out of scope right now since we are just talking about naming of indices. Please note the following does not use ILM. That seems like a bit of a stretch to do now with things more critical. Once things quite down you can go back and get all the indices we make below to work under ILM. With that out of the way lets get going.

First thing I'd do is change your current config to:

"syslog-%{+YYYY}"

Once that is done we are going to move on to give your cluster some breathing room.

I'm going to guess again, that there are probably 365 syslog indices from the year 2022 for example. To me the easiest thing to do is to just reindex all of those into a year level index. Something like this would probably do the trick:

POST /_reindex?wait_for_completion=false
{
  "source": {
    "index": "syslog-2022.01.01"
  },
  "dest": {
    "index": "syslog-2022"
  }
}

Those jobs should run very quickly at a few megabytes a piece. You will probably never see them when you run:

GET _tasks?actions=*reindex&detailed

You'll need to repeat that command for each day, but that could be written up ahead of time, and will only need to be done one time. You could probably run a years worth of commands through dev tools in kibana in under a half hour (ctrl-enter is your friend here). If you forget if you've reindexed an index or not, no worries you can just send it again. By default reindex will ignore duplicates. See reindex api documentation for more info.

Once syslog-2022 has all the same number of documents as the daily source indices you can delete the source indices. You wouldn't even have to do the entire year, you could do a month at a time. At this point you should be far away from the 1000 node limit. If there are multiple years worth of data you can reindex them in a similar fashion. Just change the dest index accordingly (ex: syslog-2021).

Finally you will have some 2023 indices that will need to be reindexed into syslog-2023, and can be done in a similar fashion.

Let me repeat, please use ILM after this clean up as it best fits your need. That is the long term solution. This way will give you some breathing room to make sure ILM is setup correctly.

Andrew and Badger. thank you both very much for the information you have provided above. It makes perfect sense to me now and I can follow the steps. I am going to read the documentation you have both linked in your replies.

It is great information explained in a manner that I can digest,.

M.Kirby

Hello Andrew;

I have gone through and re-indexed all of 2022. But how do I know if it worked? I have read through the reindex api documentation that you mentioned, and I did not see that one item. I am always nervous about deleting information until I am sure it has been backed up.

When I ran the reindex command from Kibana--> Dev Tools I would see the following response on the right side of the window.

{
   "task" : "DYyy7s.............626"
}

Each time I would run the command the number at the end would increase.

Thank you, for the information it is MOST helpful. I admit to being gun shy when it comes to data retention as it is crucial.

Regards and Thank you.

M.Kirby

Sure thing, that makes sense and your right to be careful with that. What you'll want to is run:

GET _cat/indices/syslog-2022*?v&s=i&h=i,docs.count

This should return just your 2022 syslog indices with headers turned on in the output (v) and sorted by index name (s=i), and only show (h=i,docs.count) the index and docs.count column.
Look at the "docs.count" column and the value for syslog-2022. This number should equal the sum of all the indices that you reindexed. I'd recommend copying these values to a spreadsheet and having it do all the addition for you. If the values match you'll know you have all the documents in the new index. If syslog-2022 is coming up short you can rerun the reindex commands and it will only add new values.

When it comes to deleting indices through the dev console is always run a GET before the DELETE. For example, if you wanted to delete the first 9 days of 2022 I'd run:

GET _cat/indices/syslog-2022.01.0*

If you only see 9 indices being returned then I'd copy (not type) the "syslog-2022.01.0*" into the command:

DELETE syslog-2022.01.0*

That way you know you are only deleting those 9 indices. You can increase the wildcard scope to hit more indices at once. Just make sure you only see the indices you are excepting in the GET _cat/indices before running the DELETE and you should be fine.

Just want to add a comment about reindexing. When you reindex it will give you the task id number that you can then use later if needed. Generally that only applies for long running tasks. You might want to slow down the requests_per_second for performance reasons for example. You might also want to cancel a job. All that can be done by referencing task id. You can see all reindexing tasks that are currently running with:

GET _tasks?actions=*reindex&detailed

In your case I image that would always return empty because indexes with only a few megabytes of data generally reindex in the blink of an eye.

Good luck, and hope the cleanup goes well.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.