There are two independent sources gathered by Logstash (two indexes: index1, index2)
Every source, has own Logstash pipeline.
Index1 looks simple.
Logstash crawls CSV files, there are some filter rules like, csv, translate,dissect and fingerprint
fingerprint is calculated from two fields
concatenate_sources => true
Logstash in output, has interesting options
doc_as_upsert => true
document_id => "%{fingerprint}"
Also here is defined pipeline where document will be sent
Ingest pipeline has few processor like date, grok, date
_enrich/policy is executed 3x per day
"match_field" : "index1.number",
"enrich_fields" : [
"geo.region_iso_code_old",
"index1.commune.id",
"index1.company.id",
"index1.company.name",
"index1.service"
]
CSV file is created once per day (~00:00AM)
Above image shows that ~98% records are
updated
Index2 are more complex.
Logstash crawls CSV files, there are some filter rules like, csv, dissect, mutate, and it is sent to ingest pipeline.
That pipeline has many sub-pipelines, but in one of them contains enrich processor, which depends on index1
"enrich" : {
"tag" : "index1 b",
"ignore_missing" : true,
"policy_name" : "index1",
"field" : "tmp.enrich_pl.value",
"target_field" : "tmp.enrich_pl.b",
"max_matches" : "1"
}
Final effect