How to create unique id in logstash/elastic search for apache logs in a distributed server environment

How to create unique id in logstash/elastic search for apache logs in a distributed server environment
so that when you reupload apache logs, logstash/es will update them instead of creating duplicate records.

Hello,

My guess would be to use the fingerprint plugin of logstash to generate a hash based on the input line. This value can then be used as document_id. At last you would have to set the output of logstash to update. Be aware though that I didn't try that out.

Have a look at the following blogs on the topic:

The problem is say if there are hard hitting users. Same request log line can be present more than once in the log(At same time, same user made multiple same requests from same machine).So if we hash the data it will come same, and valid entries also be treated as duplicates.

If we use fingerprint :

The problem is say if there are hard hitting users. Same request log line can be present more than once in the log(At same time, same user made multiple same requests from same machine).So if we hash the data it will come same, and valid entries also be treated as duplicates.

If we use uuid -

Reuploading the log file is creating duplicate entries.

If you use UUID, you need to do so at the source and then not reprocess the same data. If you calculate a hash and can have identical messages, make sure you include information that sets them apart when you perform your hash calculation, e.g. file name and offset. Another option is to add a UUID to the data before it is first written to file.

Hi Christian,

How to do that in logstash... Can u share one sample/example to understand more about the use cases u shared.

Thanks

Did you read the blog posts I linked to? They should contain the information you need.

Hi Christian Thanks a lot for the blogs.

Can you please check : Logstash to parse only part of apache log file

Just want to know using logstash if we can parse only part of apache log file. More details are in the post.

Thanks

without reading too much in to your post here is what I did for unique id

I pay around with fingerprint and did manage to create unique id. but didn't like the idea about it.
here created my own unique id

this was data coming from jdbc connection with number of row. and when I combine these three field it is unique.

For example
Where projectname, systemtypeid and username combination will be unique forever.

filter {
mutate {
add_field => {
"doc_id" => "%{projectname}%{systemtypeid}%{username}"
}
}

This give me exact duplicate of what I receive from database via jdbc.

Now again I was reading same database table with jdbc but wanted to save this data. i.e read a table once a day and save it. again read a data and put it in ES with second day. each document present size of project by username.

to do that I created another document id which can be unique per day

doc_id => "%{projectname}%{systemtypeid}%{username}%{+dd-MM-YYYY}"

and use this in to output section
document_id => "%{doc_id}"

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.