Question around setting proper ds / index / ilm

alexsamad · April 25, 2023, 8:17am

Hi

new to ES, i have 12 node cluster and its purpose is to capture all of the logs from apps in our 14 env - lets call them dev1-14. each env has 6 apps server and 2 rp and 2 geodes and jmp box - so 11 servers. on the app servers there might be 6-8 apps running. on the others maybe 2-4 apps.

its the logs from those app that I want to keep

I have setup filebeat and logstash. But having run into index limits I have had to re think how I do thinks.

What I have is
data view per env, so I think i need atleast a datastream per env.

I have also created a datastream per app - why some apps are rather chatty - or developers turn on debug mode and produce 100G of debug text. I would like to delete just that apps documents - i figure having a datastream per app gives me that.

Originally I had I also split that done to the app server - but I found I ended up with too many indices and I don't think it helps me.

Next I created an ilm
hot 6 days - roll at 50G - roll at 5 days
warm mark as read only, I am hoping to save resource by doing this - do i ?

then after 15 days it gets deleted.

My questions

do I really need all of those datastreams I'm thinking all i really is a datastream per env
how do I delete very large number of document easily if I had to so if I have a field that the app name how can i / can I say delete from this date to this date all documents that have the field == 'app name' << this would give me a process to handle apps that spew out lots of shit stuff
how to I manage ilm - in theory the data stream should only be written to nothing should be edited - I do have to handle if the server is down - so old data might need to be inserted because it was never read to begin with - so I think 5 days in warm is enough - maybe could push it to 1 week
should I use warm / read only does it help
what if any advance ilm stuff should i use on these - shrink/merge ??? not sure it would help
after say 15 or 20 or ... day I want to delete the indices - i am not using ES for long term storage. if i set it to 15 days is it from the last document written to it or is it from the day the index was created

Basically I want to keep a sliding window of say 14 / 21 days of documents and purge anything older
I don't think I can get the ilm to do that so ?
Should I roll at 1 day - but will i end up with to many indices.

Thanks

warkolm · April 25, 2023, 10:54pm

And for your questions;

Depends, being able to easily manage retention for one large producer makes a tonne of sense
The "easy" way is a delete by query, which is expensive for Elasticsearch to execute
Not sure what you're asking here sorry
Do you need it with those retention times? Does your infrastructure allow you to implement it effectively?
Probably not given you are retaining things for ~2 weeks
It's from when the index was created

TLDR yes use ILM.

alexsamad · April 26, 2023, 12:02am

Thanks,

#1 Okay - I think i am okay with this as long as I can deal with the situation of a run away process logging lots of stuff which leads to #2
#2 so this is possible but expensive - can you point out a link on how this might be done - would be it be a query to find all of the document ID and then a query to delete those

#3 This has to do with read only indexs, I had presumed indexes are less resource using if they are in read only mode - I previously ran into segment limits (sorry I can't remember the actual name - i had a index per day per server and per app. each open index too up a resource and the system wouldn't go )

So it sounds like I have a datastream per env
as part of the ILM I roll the index per day and also at 50G
and then move from hot to warm at 5 days and then move to read only

#4 I do this because I think (maybe wrongly) it reduces resource usage

#5 right now its 2 weeks, but I will expand once I see how much data there is
#6 okay - should i / can i rotate daily - is there a limit to how many open index I have - is it different if they are datastream index

also new question can I use ilm to delete index when say 100G of data has been allocated

warkolm · April 26, 2023, 12:56am

For 2 - Delete by query API | Elasticsearch Guide [8.7] | Elastic
For 4 -it does but how much of an impact does it make
For 6 - you can if you want. It's not different for datastreams, they are ab abstraction of things you are already doiig with time series data

Yes.

alexsamad · April 26, 2023, 2:10am

#2 nice all my google found was delete by doc id .

expanding on deleting on size
is it possible in a ilm to say - move to read only if on the cluster there is only 20% space left

warkolm · April 26, 2023, 11:03am

Nope, sorry.

alexsamad · April 28, 2023, 7:04am

Thanks for your help

system · May 26, 2023, 7:04am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Large log data indexing best practices (datastreams?) Elasticsearch ilm-index-lifecycle-management , datastreams	2	29	February 19, 2025
How to configure datastream and ILM to keep logs for specific time Logs ilm-index-lifecycle-management	3	370	June 2, 2021
Elasticsearch ILM Policies Elasticsearch ilm-index-lifecycle-management	1	242	October 21, 2023
Indexing best practice Elasticsearch	4	464	December 23, 2020
Index houskeeping (ILM) Elasticsearch rollups	6	423	March 7, 2022

Question around setting proper ds / index / ilm

Related topics