I have created 2 logstash pipelines to index docs in Index A and Index B. I am pulling data from 2 tables in Oracle DB and indexing it in Index A & B respectively.
The requirement is to update all docs in Index B for a given attribute value term/filter, if the attribute changes in Index A
I am trying to trigger one to many bulk updates to docs in Index B using _update_by_query & HTTP output plugin in logstash i.e. if a doc updates in Index A , I want to update multiple docs in Index B.
Approach# 1
Here is what i have done to achieve this i Index A:
filter {
ruby {
code => "
boolArr = [
{
'term'=> {
'rootAttr.childAttr1.keyword' => {
'value' => event.get('[rootAttr][childAttr1]')
}
}
},
{
'term'=> {
'rootAttr.childAttr2.division.keyword' => {
'value' => event.get('[rootAttr][childAttr2]')
}
}
}
]
event.set('[query][bool][must]', boolArr )
event.set('[script][params][newRootAttr]', event.get('rootAttr') )
"
}
mutate {
add_field => {
"[script][lang]" => "painless"
"[script][source]" => "ctx._source.rootAttr = params.newRootAttr"
}
}
}
output {
http {
url => "${elasticsearch_hosts}/index_B/_update_by_query?conflicts=proceed"
http_method => "post"
format => "json"
}
}
Approach# 2:
Pull data from DB based on greatest value of tracking column from both the tables & re-index docs in Index B
I want to understand what is the best approach out of above 2 approaches from performance and error handling perspective.
Thanks,
Mo