Hello,
I'm new in using ELK and I'm looking for your help to solve a problem that I'm facing lately.
To sum up I'm trying to import data from a database table that doesn't contain a primary key. The jdbc plugin was used in this case. Since the import operation will be executed regulary I need to generate a document_id in order to avoid duplicated and redundant data. To get rid of this problem 2 approaches were adopted.
The first one consists in:
Creating a new filed called Id, populate it with data coming from other attributes (table columns). So it will play the role of a composite ID. In elasticsearch's Words, it consists in creating a field by concatenating multiple dynamic fields
Setting the newly created field as a document_ID.
Below is the logstash config file that was generated to this purpose
nput {
jdbc {
#Pipline Config between DB and ES
}
}
filter {
mutate { convert => {"PORT" => "integer"} }
mutate { convert => {"PID" => "integer"} }
mutate { convert => {"COUNTER" => "integer"} }
mutate { convert => {"EXPORT_STATE" => "integer"} }
date{
locale => "eng"
match => ["OPX2_DATE", "yyyy-MM-dd", "ISO8601"]
target => "OPX2_DATE"
}
date{
locale => "eng"
match => ["PDATE", "yyyy-MM-dd", "ISO8601"]
target => "PDATE"
}
mutate {
add_field => {
# I used this format
"id" => "%{HOST} %{PORT} %{PID} "
}
# or this one too
#add_field => ["id", "%{HOST} %{PORT} %{PID}"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "XXXXX"
document_type=> "XXXX"
document_id => "%{id}"
}
}
The first declared dynamic field will be correctly converted to a value while the others not. The Id that we get has this value "ValueOfThefirstDynamicFieldWhichIsHost"%{PORT}%{PID}.
This results to less records exported to the elasticsearch as expected.
The second one consists in:
Creating a fingerprint field using all the attributes needed.
Setting this fingerprint as document_ID
Below is the logstash config file that was generated to this purpose
input {
jdbc {
#Pipline Config between DB and ES
}
}
filter {
mutate { convert => {"PORT" => "integer"} }
mutate { convert => {"PID" => "integer"} }
mutate { convert => {"COUNTER" => "integer"} }
mutate { convert => {"EXPORT_STATE" => "integer"} }
date{
locale => "eng"
match => ["OPX2_DATE", "yyyy-MM-dd", "ISO8601"]
target => "OPX2_DATE"
}
date{
locale => "eng"
match => ["PDATE", "yyyy-MM-dd", "ISO8601"]
target => "PDATE"
}
fingerprint {
key => "hashKey"
source => ["%{HOST}", "%{PORT}", "%{PID}"]
method => "SHA1"
concatenate_sources=> true
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "XXXXX"
document_type=> "XXXX"
document_id => "%{fingerprint}"
}
}
As result, one and only one record will be exported to the elasticsearch. That means that each time a new record comes, it overwrites the old ones because persumably the same fingerprint will be generated for all records.
I'm looking forward to your support and to your ideas.
Best regards,