Combining multiple documents based on ID

I' m not sure if I got what you are looking for logstash to do. But going by the topic discussion I'm replying what I understand.

I do the following way to pull data in my ES index and structure it for convenience so that searches can be done in a simpler way.

input {
jdbc {
jdbc_driver_library => "path to the jdbc driver for the DB you are using. in my case it was sql"
jdbc_driver_class => "Java::com.mysql.cj.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql:db connection string"
jdbc_user => "dbusername"
jdbc_password => "dbpassword"
schedule => "* * * * *"
tags => "manufacturer_product_location_map"
statement => "SELECT product_location_id, location_id, manufacturer_brand_id, product_id,
IF (map_product_location.modified_date >= map_product_manufacturer_brand.modified_date, map_product_location.modified_date, map_product_manufacturer_brand.modified_date) modified_date
FROM rapp_product.map_product_location INNER JOIN rapp_product.map_product_manufacturer_brand
ON (map_product_location.product_manufacturer_brand_id = map_product_manufacturer_brand.product_manufacturer_brand_id)
where (( map_product_location.modified_date is NULL || map_product_manufacturer_brand.modified_date is NULL)
|| (map_product_location.modified_date > :sql_last_value || map_product_manufacturer_brand.modified_date > :sql_last_value))"
last_run_metadata_path => "/logstash_ret/.mnfctr_prod_locn_map_retailer_ls_jdbc_last_run"
}
}
output {
if "manufacturer_product_location_map" in [tags] {
elasticsearch {
hosts => ["localhost:9200"]
index => "es_rapp_product"
document_type => "manufacturer_product_location_map"
document_id => "%{product_location_id}"
}
}
}

So in the statement in jdbc section in input part of the logstash conf file we have the freedom to query the DB and the output of this is indexed into the ES in the "index" under the type document_type and unique identifier is the document_id as you can see in the output segment.

In ES it looks like below

{
"_index": "es_rapp_product",
"_type": "manufacturer_product_location_map",
"_id": "14",
"_score": 1,
"_source": {
"product_location_id": 14,
"location_id": 20,
"manufacturer_brand_id": 1,
"product_id": 14,
"modified_date": null,
"@version": "1",
"@timestamp": "2017-11-01T13:48:00.282Z",
"tags": [
"manufacturer_product_location_map"
]
}
},
{
"_index": "es_rapp_product",
"_type": "manufacturer_product_location_map",
"_id": "19",
"_score": 1,
"_source": {
"product_location_id": 19,
"location_id": 20,
"manufacturer_brand_id": 1,
"product_id": 19,
"modified_date": null,
"@version": "1",
"@timestamp": "2017-11-01T13:48:00.282Z",
"tags": [
"manufacturer_product_location_map"
]
}
},
{
"_index": "es_rapp_product",
"_type": "manufacturer_product_location_map",
"_id": "22",
"_score": 1,
"_source": {
"product_location_id": 22,
"location_id": 20,
"manufacturer_brand_id": 1,
"product_id": 22,
"modified_date": null,
"@version": "1",
"@timestamp": "2017-11-01T13:48:00.283Z",
"tags": [
"manufacturer_product_location_map"
]
}
},

Hope this helps.

Hello Deepti_Jain,

Your JDBC configuration looks like my configuration. There is not problem pulling data from SQL Server to Elasticsearch, it works fine.

Here my need is to combine all the documents into "one document" in the Elasticsearch index according to a column value. For example, with your data, do you know how to produce one document in the Elasticsearch depending on your "location_id": 20 value? I see you're producing one document for every new row in the SQL Server.

I need to group all the documents coming from SQL Server into one document containing "location_id": 20 value in it only.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.