I cannot create document Ids for my imported data

Hello,

I'm new in using ELK and I'm looking for your help to solve a problem that I'm facing lately.
To sum up I'm trying to import data from a database table that doesn't contain a primary key. The jdbc plugin was used in this case. Since the import operation will be executed regulary I need to generate a document_id in order to avoid duplicated and redundant data. To get rid of this problem 2 approaches were adopted.

The first one consists in:

Creating a new filed called Id, populate it with data coming from other attributes (table columns). So it will play the role of a composite ID. In elasticsearch's Words, it consists in creating a field by concatenating multiple dynamic fields
Setting the newly created field as a document_ID.

Below is the logstash config file that was generated to this purpose
nput {
jdbc {
#Pipline Config between DB and ES
}
}

filter {
mutate { convert => {"PORT" => "integer"} }
mutate { convert => {"PID" => "integer"} }
mutate { convert => {"COUNTER" => "integer"} }
mutate { convert => {"EXPORT_STATE" => "integer"} }

date{
	locale => "eng"
	match => ["OPX2_DATE", "yyyy-MM-dd", "ISO8601"]
	target => "OPX2_DATE"
}
date{
            locale => "eng"
            match => ["PDATE", "yyyy-MM-dd", "ISO8601"]
            target => "PDATE"
    }
mutate {
add_field => {
  # I used this format
  "id" => "%{HOST} %{PORT} %{PID} "
  
  
}
# or this one too 
#add_field => ["id", "%{HOST} %{PORT} %{PID}"] 

}

}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "XXXXX"
document_type=> "XXXX"
document_id => "%{id}"
}
}

The first declared dynamic field will be correctly converted to a value while the others not. The Id that we get has this value "ValueOfThefirstDynamicFieldWhichIsHost"%{PORT}%{PID}.
This results to less records exported to the elasticsearch as expected.

The second one consists in:

Creating a fingerprint field using all the attributes needed.
Setting this fingerprint as document_ID

Below is the logstash config file that was generated to this purpose
input {
jdbc {
#Pipline Config between DB and ES
}
}

filter {
mutate { convert => {"PORT" => "integer"} }
mutate { convert => {"PID" => "integer"} }
mutate { convert => {"COUNTER" => "integer"} }
mutate { convert => {"EXPORT_STATE" => "integer"} }

date{
	locale => "eng"
	match => ["OPX2_DATE", "yyyy-MM-dd", "ISO8601"]
	target => "OPX2_DATE"
}
date{
            locale => "eng"
            match => ["PDATE", "yyyy-MM-dd", "ISO8601"]
            target => "PDATE"
    }
fingerprint {
		key => "hashKey"
		source => ["%{HOST}", "%{PORT}", "%{PID}"] 
	method => "SHA1"
	concatenate_sources=> true
}

}

}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "XXXXX"
document_type=> "XXXX"
document_id => "%{fingerprint}"
}
}

As result, one and only one record will be exported to the elasticsearch. That means that each time a new record comes, it overwrites the old ones because persumably the same fingerprint will be generated for all records.

I'm looking forward to your support and to your ideas.

Best regards,

For a fingerprint filter that should be

source => ["HOST", "PORT", "PID"]

Hi, Sorry it is a typo. I used the format that you mentioned and I got the same fingerprint for all records. So at the end, only one record will remain in elasticsearch.

I managed to make it work with the fingerprint functionality. I had just to change the method to SHA256.
So the fingerprint configuration part should look like that at the end.
fingerprint {
source => ["host", "port", "pid"]
concatenate_sources => true
target => "[@metadata][fingerprint]"
key => "Log analytics"
method => "SHA256"
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.