Keep only last record of each type

Hello all.

I have to keep 2 types of indices on elasticsearch:

1- full, regular index holding all records sent to it (logstash-YYYY.MM.DD)
2- small, fast index holding only the last log message of each kind (lastaction-YYYY.MM)

I need some help setting up the number 2 above.

Currently i'm using the fingerprint filter to generate a hash (SHA1) for each
message field, and using this hash as the document_id for the elasticsearch output.

filter {
    if ("cloned" in [tags]) {
        uuid {
            add_tag     => [ "lastlogin" ]
            overwrite   => true
            target      => "@uuid"
        }
        fingerprint {
            key     => "lastlogin"
            method  => "SHA1"
        }
    }
}

output {
    if ("lastlogin" in [tags]) {
        elasticsearch {
            document_id         => "%{fingerprint}"
            index               => "lastaction-%{+YYYY.MM}"
            sniffing            => true
            template_overwrite  => true
        }
    }
}

How would you guys do it?

I'm asking because despite not seeing any duplicated records (i think this part is ok)
the index is bigger in size when compared to de full/regular one.
And it should not.

Also, i expected it would only update the timestamp if a newer record was inserted,
but in fact the @timestamp field is always updated, even if it is a date in the past.

Any ideas?

How would you guys do it?

That's what I'd do.

I'm asking because despite not seeing any duplicated records (i think this part is ok)
the index is bigger in size when compared to de full/regular one.
And it should not.

It could be larger than expected unless it's optimized to expunge deleted documents. I don't remember to which extent this takes place automatically; check the ES documentation. Keep in mind that ES treats updates like a delete followed by a new document.

Also, i expected it would only update the timestamp if a newer record was inserted,
but in fact the @timestamp field is always updated, even if it is a date in the past.

I'm afraid I don't understand this part.

sometimes i have to feed old logs to elasticsearch.

lets say that i have the following line indexed:

@timestamp  July 8th 2016, 08:00:20.002
message     open: user edgar-allan^poe@bookwriters.com opened INBOX/Trash

and later i backfill a file from july 6th.

@timestamp  July 6th 2016, 01:00:10.001
message     open: user edgar-allan^poe@bookwriters.com opened INBOX/Trash

i would like to maintain the most recent record based on @timestamp, but logstash just replaces the existing record with the last one.

how can i make it evaluate the @timestamp field, updating the record if more recent, but discarding if older?

Logstash will update the document as a whole so it seems very unlikely that it wouldn't update @timestamp too. Have you double-checked that the @timestamp field really has the correct contents when you're backfilling data?

yes, the content is correct.

i am trying to mimic a SQL trigger, that just updates the @timestamp field.
(it should never go back in time)

CREATE TRIGGER dbo.trgAfterUpdate ON dbo.TblLastAction
AFTER UPDATE 
AS
  UPDATE dbo.TblLastAction
  SET last_changed = GETDATE()
  FROM Inserted i