Logstash control records


(FREDERIC) #1

hi all,
I've got an issue with logstash, hiw can I perform a check on specific indexes in elastic search, so that i can assure the record I am loading (in the pipeline) doesn't already exist on elastic search?

thank you


(Magnus Bäck) #2

Querying ES for every event isn't very efficient. I suggest you instead set the document id yourself based on one or more of the fields in each event instead of having ES autogenerated that id for you. That way, if the same event turns up again the origin document in ES would get updated automatically.


(FREDERIC) #3

Yes , i use the personal id , but i configured indexes with NAME + YEAR + MONTH , so when i load an event load an event , sometimes is duplicated , because it is already present .


(Magnus Bäck) #4

Are you really storing time-series data where timestamp-based indexes are relevant? If you explain the background more it'll be easier to suggest a good solution.


(FREDERIC) #5

i storing my indexes with name+year+month , when i upload an event , logstash inserts it in the latest index, without checking the previous indexes.


(Magnus Bäck) #6

Yes, I understand what you're doing. What I don't understand is why. It sounds like that data you're indexing isn't time-series data, because time-series data typically doesn't need to be updated after a few days like you're data does. If the data isn't time-series data I suggest you stop treating it as such, and the problem you currently have would go away.


(FREDERIC) #7

I'm doing it for storing data


(Magnus Bäck) #8

Yes, of course you're storing data. That doesn't answer the "why" question. Last chance: Why is it necessary to treat this data as time-series data when it doesn't seem to be time-series data?


(Magnus Bäck) #9

You should be able to use the elasticsearch filter, by the way.


(system) #10