Oracle 11g Synchronization with logstash

Abdullah_Alaiwat · January 15, 2017, 10:29am

Hello,

I'm using the logstash 5.1.1 to get the data from Oracle DB 11g to elastic search, the connection and data fetching is working fine but the problem is with the synchronization between the DB and Elasticsearch, every time the job will run the statement and the data will be duplicated into elastic, how can i make it synchronized? to take the new inserted, updated and remove deleted records dynamically while the job is running?

here is the config file for the JDBC connection:

input {
jdbc {
jdbc_driver_library => "pat of (ojdbc6.jar)"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "jdbc:oracle:thin:@DB_IP:Port:DB_ServiceName"
jdbc_user => "db_schema_name"
jdbc_password => "db_schema_password"
jdbc_validate_connection => true
statement => "SELECT * FROM table_1"
}
}

output {
elasticsearch {
action => "index"
hosts => "localhost:9200"
index => "Index_1"
document_type => "record"
workers => 1
}
}

Thanks.

magnusbaeck · January 15, 2017, 10:40am

This topic from a few days ago is relevant to your problem:

You should also read the jdbc input documentation, specifically what's said about the sql_last_value parameter.

Abdullah_Alaiwat · January 15, 2017, 11:49am

Hi,

I have checked the mentioned URL, but i am not getting the point how to configure the document_id parameter in the input and output, can you give me an example on it using the same JDBC connection?

Abdullah_Alaiwat · January 15, 2017, 12:22pm

in addition, when i add the following to the output i have only one record:
document_id => "%{field_name (the primary key field)}"

magnusbaeck · January 16, 2017, 6:34am

I have checked the mentioned URL, but i am not getting the point how to configure the document_id parameter in the input and output, can you give me an example on it using the same JDBC connection?

You're on the right track. Since your data already has a primary key you don't need the fingerprint filter and you can just reference the existing field containing the primary key.

in addition, when i add the following to the output i have only one record:

What id does that document have? What does the document look like?

Abdullah_Alaiwat · January 16, 2017, 7:54am

its an Oracle table, contains a string primary key (varchar2)

magnusbaeck · January 16, 2017, 8:17am

What id does that document have? What does the document look like?

Abdullah_Alaiwat · January 16, 2017, 10:17am

how can i identify the ID which is used by the document? the document will show all records from a statement in JDBC with the timestamp of each one

magnusbaeck · January 16, 2017, 10:48am

how can i identify the ID which is used by the document?

In Kibana this is displayed as the _id field.

the document will show all records from a statement in JDBC with the timestamp of each one

I'm not sure what you mean. Please copy/paste an example document from the JSON tab in Kibana.

Abdullah_Alaiwat · January 16, 2017, 10:49am

{
  "_index": "bcc_customers",
  "_type": "record",
  "_id": "%{customer_NO}",
  "_score": null,
  "_source": {
    "company_type": null,
    "notes": "²ßáï",
    "reg_no": null,
    "birth_date": "1956-03-15T22:00:00.000Z",
    "phone2": null,
    "channel": null,
    "phone_work": null,
    "employee_no": null,
    "cam_lan_id": null,
    "bcc_cty_id": 1,
    "type": "bcc_customers",
    "in_id": null,
    "manager_pass_id": null,
    "modification_date": null,
    "passport_expiry_date": null,
    "employer": null,
    "fax": null,
    "customer_address": null,
    "customer_no": 9018,
    "id_number": "AA0684791",
    "crg_prf_id": 1,
    "use_parent_address": 1,
    "vat": null,
    "creation_date": "2005-12-19T22:00:00.000Z",
    "valid_qs_status": 4,
    "created_by": "EKHACHATRYAN",
    "tags": [],
    "father_name_2": "èáõµ»ÝÇ",
    "father_name": "Ruben",
    "phone": "010348901",
    "old_prof_no_mig": null,
    "corporate_name_2": null,
    "gender": 1,
    "family_name_2": "¼³ù³ñÛ³Ý",
    "scale": null,
    "eduacation": null,
    "first_name_2": "²ßáï",
    "crg_cnt_code": "AM",
    "issue_date": null,
    "brs_ret_code": "PROVIS",
    "is_forensics_blacklisted": null,
    "monthlysalary": null,
    "activity_type": null,
    "@version": "1",
    "corporate_name": null,
    "id_type": 1,
    "forensics_status": 4,
    "receive_market_msgs": null,
    "crg_tit_id": 1,
    "first_name": "Ashot",
    "email": null,
    "issued_by": null,
    "contact_person": null,
    "no_of_employees": null,
    "bcc_cst_no": null,
    "employment": null,
    "vet_status": 4,
    "in_cus_acc_id": null,
    "bcc_cst_level": 1,
    "manager_name": null,
    "marital_status": 2,
    "bcc_cct_id": 1,
    "company_tax_code": null,
    "@timestamp": "2017-01-16T10:23:29.273Z",
    "modified_by": null,
    "sub_info_no_reg": null,
    "family_name": "Zakaryan",
    "age": null
  },
  "fields": {
    "birth_date": [
      -435376800000
    ],
    "creation_date": [
      1135029600000
    ],
    "@timestamp": [
      1484562209273
    ]
  },
  "sort": [
    1484562209273
  ]
}

Abdullah_Alaiwat · January 16, 2017, 10:51am

the _ID field appears like '%{customer_NO}' (with the text not the number)

magnusbaeck · January 16, 2017, 10:51am

Please format things like this as preformatted text.

"_id": "%{customer_NO}",

The document doesn't contain a customer_NO field. Use customer_no instead.

Abdullah_Alaiwat · January 16, 2017, 10:55am

now its "_id": "%customer_no", but i still facing the same problem, i have ran the statement:

select * from table1 where rownum < 10

but i have only 3 records in Kibana (for two items - one is duplicated).

magnusbaeck · January 16, 2017, 10:57am

Show the documents. Format them as preformatted text.

Abdullah_Alaiwat · January 16, 2017, 12:18pm

{
  "_index": "bcc_customers",
  "_type": "record",
  "_id": "%customer_no",
  "_score": null,
  "_source": 
    {
      "company_type": null,
      "notes": "23082003",
      "reg_no": null,
      "birth_date": "1974-03-10T22:00:00.000Z",
      "phone2": null,
      "channel": null,
      "phone_work": null,
      "employee_no": null,
      "cam_lan_id": null,
      "bcc_cty_id": 1,
      "type": "bcc_customers",
      "in_id": null,
      "manager_pass_id": null,
      "modification_date": "2012-12-26T08:44:35.000Z",
      "passport_expiry_date": null,
      "employer": null,
      "fax": null,
      "customer_address": null,
      "customer_no": 9024,
      "id_number": "126",
      "crg_prf_id": null,
      "use_parent_address": 1,
      "vat": null,
      "creation_date": "2005-12-19T22:00:00.000Z",
      "valid_qs_status": 7,
      "created_by": "KBALEKJIAN",
      "tags": [],
      "father_name_2": "úÑ³Ý»ëë",
      "father_name": "Ohaness",
      "phone": "010249930",
      "old_prof_no_mig": null,
      "corporate_name_2": null,
      "gender": 1,
      "family_name_2": "´³É»ùçÛ³Ý",
      "scale": null,
      "eduacation": null,
      "first_name_2": "Î³ñÝÇ·",
      "crg_cnt_code": "AM",
      "issue_date": null,
      "brs_ret_code": "PROVIS",
      "is_forensics_blacklisted": null,
      "monthlysalary": null,
      "activity_type": null,
      "@version": "1",
      "corporate_name": null,
      "id_type": 2,
      "forensics_status": 7,
      "receive_market_msgs": null,
      "crg_tit_id": 1,
      "first_name": "Karnig",
      "email": null,
      "issued_by": null,
      "contact_person": null,
      "no_of_employees": null,
      "bcc_cst_no": null,
      "employment": null,
      "vet_status": 6,
      "in_cus_acc_id": null,
      "bcc_cst_level": 1,
      "manager_name": null,
      "marital_status": 2,
      "bcc_cct_id": 6,
      "company_tax_code": null,
      "@timestamp": "2017-01-16T10:59:24.815Z",
      "modified_by": "SYSADMIN",
      "sub_info_no_reg": null,
      "family_name": "Balekjian",
      "age": null
    },
    "fields": 
    {
      "creation_date": [1135029600000],
      "@timestamp": [1484564364815],
      "birth_date": [132184800000],
      "modification_date": [1356511475000]
    },
    "sort": [1484564364815]
}

magnusbaeck · January 16, 2017, 12:33pm

"id": "%customerno",

Where are the curly braces?

When you're not posting your JSON as preformatted text (using the </> toolbar button) parts of the JSON text disappears, making it ambiguous and possibly leading to misunderstandings. I don't have time for misunderstandings so unless you start formatting the text as requested I won't spend more time in this thread.

Abdullah_Alaiwat · January 16, 2017, 12:37pm

Sorry for that, i am still beginner in JSON at all, hope this will get you what you need exactly:

{
  "_index": "bcc_customers",
  "_type": "record",
  "_id": "9026",
  "_score": null,
  "_source": {
    "company_type": null,
    "notes": "409100",
    "reg_no": null,
    "birth_date": "1962-04-16T22:00:00.000Z",
    "phone2": null,
    "channel": null,
    "phone_work": null,
    "employee_no": null,
    "cam_lan_id": null,
    "bcc_cty_id": 1,
    "type": "bcc_customers",
    "in_id": null,
    "manager_pass_id": null,
    "modification_date": null,
    "passport_expiry_date": null,
    "employer": null,
    "fax": null,
    "customer_address": null,
    "customer_no": 9026,
    "id_number": "AA0338434",
    "crg_prf_id": 1,
    "use_parent_address": 1,
    "vat": null,
    "creation_date": "2005-12-19T22:00:00.000Z",
    "valid_qs_status": 7,
    "created_by": "KHACHATRYAN",
    "tags": [],
    "father_name_2": "ê»ñÛáÅ³ÛÇ",
    "father_name": "Seryozha",
    "phone": "010 554000",
    "old_prof_no_mig": null,
    "corporate_name_2": null,
    "gender": 1,
    "family_name_2": "Ø³ÝáõÏÛ³Ý",
    "scale": null,
    "eduacation": null,
    "first_name_2": "ÐáíÑ³ÝÝë»ë",
    "crg_cnt_code": "AM",
    "issue_date": null,
    "brs_ret_code": "PROVIS",
    "is_forensics_blacklisted": null,
    "monthlysalary": null,
    "activity_type": null,
    "@version": "1",
    "corporate_name": null,
    "id_type": 1,
    "forensics_status": 7,
    "receive_market_msgs": null,
    "crg_tit_id": 1,
    "first_name": "Hovhannes",
    "email": null,
    "issued_by": null,
    "contact_person": null,
    "no_of_employees": null,
    "bcc_cst_no": null,
    "employment": null,
    "vet_status": 6,
    "in_cus_acc_id": null,
    "bcc_cst_level": 1,
    "manager_name": null,
    "marital_status": 2,
    "bcc_cct_id": 1,
    "company_tax_code": null,
    "@timestamp": "2017-01-16T12:38:17.201Z",
    "modified_by": null,
    "sub_info_no_reg": null,
    "family_name": "Manukyan",
    "age": null
  },
  "fields": {
    "creation_date": [
      1135029600000
    ],
    "@timestamp": [
      1484570297201
    ],
    "birth_date": [
      -243309600000
    ]
  },
  "sort": [
    1484570297201
  ]
}

Abdullah_Alaiwat · January 16, 2017, 12:41pm

now its working, for the insert and the update it seems to be OK, but when i need to delete a record from DB it will not be deleted from the Kibana.

magnusbaeck · January 16, 2017, 1:06pm

Yeah, deletions are harder. I don't know if there's a better way than reindexing everything from the beginning, unless you can somehow capture the deletions as events and funnel them to Logstash.

Abdullah_Alaiwat · January 16, 2017, 1:09pm

reindexing everything from the beginning will cause a huge performance problem specially that our DB is very huge, any other suggestions? we are not able to capture the deletions as events neither as we cant use any procedures or objects in the statement part in the JDBC connection.

Topic		Replies	Views
Oracle to Elasticsearch Elasticsearch	10	2300	July 6, 2017
Logstash and databse Logstash	16	5705	July 6, 2017
Logstash JDBC document_id not reading latest Logstash	5	1742	February 14, 2018
Logstash JDBC Input table data not in sync with Kibana Index Logstash	9	2560	November 22, 2018
Delete elasticsearch document with logstash-jdbc-input Logstash	13	16561	July 6, 2017

Oracle 11g Synchronization with logstash

Related topics