Hi ,
I have a problem where I am getting some logs along with ID , I have two es Index (archive_index , and latest_logs) . so each time I have to check whether the current ID is same as previous ID if yes I will send the logs to my latest_logs , if not all the logs which are in latest_logs will go archive into archive index.
I thought archiving the logs can not be done using logstash. (Because for archiving first I have to check the ID , AND then getting all the records from archive_index then push them on latest_logs and then delete all the records from latest_logs for getting only new ID logs).
So I am correctly doing like this
logstash -> file -> java -> es
But the problem is when I am writing all the logs into file , file size is getting increase very rapidly (some G.B in few days).
SO I want the alternative solution for that .
- Is it possible from logstash I will feed my logs to kafta then kafta to java ? (if yes how will kafta ensure that after this much amount of data previous data should flush (if java has already got that )).
Best Regards,
Navneet
If your goal is to have one index per ID, how about including the ID in the index name? If you need to be able to refer to the current ID via a symbolic "latest" name, set up an Elasticsearch alias for that. I don't think you can get Logstash to do that though. Similarly, if you want to be able to refer to non-current indexes via a symbolic "archive" name you can use an alias for that too.
scenario is that -I will be having two index only. So only logs having latest ID will be in latest_index (for that I have to check whether the current ID is same as previous IF yes then it means that id has not changed and it is latest ID so I will push my doc into latest_index , if ID gets changed then all the doc are in latest_index will go archive . In archive index there can have many IDs but in the latest_index there will have logs with current running ID.)
EX - if am getting 10 IDs , 9 IDs along with their logs would be in archive_index and one latest ID along with its log will be in latest_index
Yes, the design is clear but I think it's flawed and will lead to unnecessary complexity to can be avoided with index aliases. What isn't clear is why it's so important to have a "latest" index. Whatever problem that design is trying to address could have other solutions.
how does aliasing the index can solve the problem ?
I have two kibana dashboards one points out to latest_id and another points to archive_id.
Is their another solution possible ?
An index alias "latest_id" that points to the currently latest index allows you to keep your existing Kibana dashboard without changes and have ES transparently translate all requests for "latest_id" to the current index. Same thing with "archive_id" except that it can point to multiple indexes (all indexes except the latest one).
If you at some point want to consolidate the archive indexes into one big index (or a few larger ones) you can do that and update the alias accordingly. This can even be an atomic operation.
- Copy documents from physical indexes archive_id_1, archive_id_2, and archive_id_3 to archive_id_123.
- Reconfigure the archive_id alias to point to archive_id_123.
- Delete archive_id_1, archive_id_2, and archive_id_3.