Hello ,
I'm facing one issue,to elaborate I've 40 elastic index and these are handled by ILM policy with rollover defined.The ilm policy is maintained to send data to new index each day(rollover) and delete after 5 days from rollover.
I've requirement to have single policy for all indices,this cause duplication of data.
Mostly logstash run perl based script to get data and insert to index,In order to avoid duplication of data I'm using doc_as_upsert in logstash(same documents gets updated with unique id maintained).This way dupliaction is avoided.
Now the challenge is for given same index pattern I've 5 rollover created due to policy and now when I do some metric or staistics visualization then data comes inaccurate due to dupliaction.
Reason Noticed: With in the same index(current rollover index) elastic understands to update the documents ,but I wanted that logstash should some how check documents in rollover index with document id present then update them and if new document then send to current index.This way duplication is avoided.
Solution :could be maintain only 1 giant index,this way document would be always unique in present index.
Reason can't use one index only : for dev,prod,qualit env data would be maintained for ex 5,10,15 days, now if followed 1 index only i.e 30 days write to same index and delete there after then customer whole data is lost.
Historical data is needed ,if using one index only then when policy will delete index whole data is gone and if used rollover index(every day rollover and delete after 5days)->this causes duplication.
I'm confused how to handle this issue?
I wan't some way to tell logstash to check the documents in rollover index if present there then update those doc and only new document index to current rollover index.
Any suggestion would be helpful.
Thanx