Logstash Performance comparison _update_by_query v/s pulling data from DB via SQL

MChat · March 26, 2020, 11:13pm

I have created 2 logstash pipelines to index docs in Index A and Index B. I am pulling data from 2 tables in Oracle DB and indexing it in Index A & B respectively.

The requirement is to update all docs in Index B for a given attribute value term/filter, if the attribute changes in Index A

I am trying to trigger one to many bulk updates to docs in Index B using _update_by_query & HTTP output plugin in logstash i.e. if a doc updates in Index A , I want to update multiple docs in Index B.

Approach# 1
Here is what i have done to achieve this i Index A:

          filter {
             ruby {
    		code => "		
    			boolArr = [
    				{
              			'term'=> {
    						'rootAttr.childAttr1.keyword' => {
    							'value' => event.get('[rootAttr][childAttr1]')
    						}
    					}
    				},
    				{
              			'term'=> {
    						'rootAttr.childAttr2.division.keyword' => {
    							'value' => event.get('[rootAttr][childAttr2]')
    						}
    					}
    				}
    			]			
    			      		
          		event.set('[query][bool][must]', boolArr )
          		event.set('[script][params][newRootAttr]', event.get('rootAttr') )
          		     		      		      		
    			"
    	}   
      mutate {
        add_field => {
          "[script][lang]" => "painless"
          "[script][source]" => "ctx._source.rootAttr = params.newRootAttr"
        
        }
      }
    }
    output {

      http {
        url => "${elasticsearch_hosts}/index_B/_update_by_query?conflicts=proceed"
        http_method => "post"
        format => "json"
      }
    }

Approach# 2:
Pull data from DB based on greatest value of tracking column from both the tables & re-index docs in Index B

I want to understand what is the best approach out of above 2 approaches from performance and error handling perspective.

Thanks,
Mo

system · April 23, 2020, 11:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Update by query in Logstash Logstash painless	1	983	January 14, 2022
Updating Elasticsearch index with SQL data Logstash	1	336	November 15, 2018
How to update ES_index data from sql server using log-stash Automatically Logstash	4	467	October 17, 2019
Update by query with Logstash http output message Logstash painless	1	305	March 9, 2023
Use case jdbc and logstash updates Logstash	4	394	May 9, 2018

Logstash Performance comparison _update_by_query v/s pulling data from DB via SQL

Related topics