Oracle 11g Synchronization with logstash


(Abdullah Alaiwat) #1

Hello,

I'm using the logstash 5.1.1 to get the data from Oracle DB 11g to elastic search, the connection and data fetching is working fine but the problem is with the synchronization between the DB and Elasticsearch, every time the job will run the statement and the data will be duplicated into elastic, how can i make it synchronized? to take the new inserted, updated and remove deleted records dynamically while the job is running?

here is the config file for the JDBC connection:

input {
jdbc {
jdbc_driver_library => "pat of (ojdbc6.jar)"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "jdbc:oracle:thin:@DB_IP:Port:DB_ServiceName"
jdbc_user => "db_schema_name"
jdbc_password => "db_schema_password"
jdbc_validate_connection => true
statement => "SELECT * FROM table_1"
}
}

output {
elasticsearch {
action => "index"
hosts => "localhost:9200"
index => "Index_1"
document_type => "record"
workers => 1
}
}

Thanks.


(Magnus Bäck) #2

This topic from a few days ago is relevant to your problem:

You should also read the jdbc input documentation, specifically what's said about the sql_last_value parameter.


(Abdullah Alaiwat) #3

Hi,

I have checked the mentioned URL, but i am not getting the point how to configure the document_id parameter in the input and output, can you give me an example on it using the same JDBC connection?


(Abdullah Alaiwat) #4

in addition, when i add the following to the output i have only one record:
document_id => "%{field_name (the primary key field)}"


(Magnus Bäck) #5

I have checked the mentioned URL, but i am not getting the point how to configure the document_id parameter in the input and output, can you give me an example on it using the same JDBC connection?

You're on the right track. Since your data already has a primary key you don't need the fingerprint filter and you can just reference the existing field containing the primary key.

in addition, when i add the following to the output i have only one record:

What id does that document have? What does the document look like?


(Abdullah Alaiwat) #6

its an Oracle table, contains a string primary key (varchar2)


(Magnus Bäck) #7

What id does that document have? What does the document look like?


(Abdullah Alaiwat) #8

how can i identify the ID which is used by the document? the document will show all records from a statement in JDBC with the timestamp of each one


(Magnus Bäck) #9

how can i identify the ID which is used by the document?

In Kibana this is displayed as the _id field.

the document will show all records from a statement in JDBC with the timestamp of each one

I'm not sure what you mean. Please copy/paste an example document from the JSON tab in Kibana.


(Abdullah Alaiwat) #10
{
  "_index": "bcc_customers",
  "_type": "record",
  "_id": "%{customer_NO}",
  "_score": null,
  "_source": {
    "company_type": null,
    "notes": "²ßáï",
    "reg_no": null,
    "birth_date": "1956-03-15T22:00:00.000Z",
    "phone2": null,
    "channel": null,
    "phone_work": null,
    "employee_no": null,
    "cam_lan_id": null,
    "bcc_cty_id": 1,
    "type": "bcc_customers",
    "in_id": null,
    "manager_pass_id": null,
    "modification_date": null,
    "passport_expiry_date": null,
    "employer": null,
    "fax": null,
    "customer_address": null,
    "customer_no": 9018,
    "id_number": "AA0684791",
    "crg_prf_id": 1,
    "use_parent_address": 1,
    "vat": null,
    "creation_date": "2005-12-19T22:00:00.000Z",
    "valid_qs_status": 4,
    "created_by": "EKHACHATRYAN",
    "tags": [],
    "father_name_2": "èáõµ»ÝÇ",
    "father_name": "Ruben",
    "phone": "010348901",
    "old_prof_no_mig": null,
    "corporate_name_2": null,
    "gender": 1,
    "family_name_2": "¼³ù³ñÛ³Ý",
    "scale": null,
    "eduacation": null,
    "first_name_2": "²ßáï",
    "crg_cnt_code": "AM",
    "issue_date": null,
    "brs_ret_code": "PROVIS",
    "is_forensics_blacklisted": null,
    "monthlysalary": null,
    "activity_type": null,
    "@version": "1",
    "corporate_name": null,
    "id_type": 1,
    "forensics_status": 4,
    "receive_market_msgs": null,
    "crg_tit_id": 1,
    "first_name": "Ashot",
    "email": null,
    "issued_by": null,
    "contact_person": null,
    "no_of_employees": null,
    "bcc_cst_no": null,
    "employment": null,
    "vet_status": 4,
    "in_cus_acc_id": null,
    "bcc_cst_level": 1,
    "manager_name": null,
    "marital_status": 2,
    "bcc_cct_id": 1,
    "company_tax_code": null,
    "@timestamp": "2017-01-16T10:23:29.273Z",
    "modified_by": null,
    "sub_info_no_reg": null,
    "family_name": "Zakaryan",
    "age": null
  },
  "fields": {
    "birth_date": [
      -435376800000
    ],
    "creation_date": [
      1135029600000
    ],
    "@timestamp": [
      1484562209273
    ]
  },
  "sort": [
    1484562209273
  ]
}

(Abdullah Alaiwat) #11

the _ID field appears like '%{customer_NO}' (with the text not the number)


(Magnus Bäck) #12

Please format things like this as preformatted text.

"_id": "%{customer_NO}",

The document doesn't contain a customer_NO field. Use customer_no instead.


(Abdullah Alaiwat) #13

now its "_id": "%customer_no", but i still facing the same problem, i have ran the statement:

select * from table1 where rownum < 10

but i have only 3 records in Kibana (for two items - one is duplicated).


(Magnus Bäck) #14

Show the documents. Format them as preformatted text.


(Abdullah Alaiwat) #15
{
  "_index": "bcc_customers",
  "_type": "record",
  "_id": "%customer_no",
  "_score": null,
  "_source": 
    {
      "company_type": null,
      "notes": "23082003",
      "reg_no": null,
      "birth_date": "1974-03-10T22:00:00.000Z",
      "phone2": null,
      "channel": null,
      "phone_work": null,
      "employee_no": null,
      "cam_lan_id": null,
      "bcc_cty_id": 1,
      "type": "bcc_customers",
      "in_id": null,
      "manager_pass_id": null,
      "modification_date": "2012-12-26T08:44:35.000Z",
      "passport_expiry_date": null,
      "employer": null,
      "fax": null,
      "customer_address": null,
      "customer_no": 9024,
      "id_number": "126",
      "crg_prf_id": null,
      "use_parent_address": 1,
      "vat": null,
      "creation_date": "2005-12-19T22:00:00.000Z",
      "valid_qs_status": 7,
      "created_by": "KBALEKJIAN",
      "tags": [],
      "father_name_2": "úѳݻëë",
      "father_name": "Ohaness",
      "phone": "010249930",
      "old_prof_no_mig": null,
      "corporate_name_2": null,
      "gender": 1,
      "family_name_2": "´³É»ùçÛ³Ý",
      "scale": null,
      "eduacation": null,
      "first_name_2": "γñÝÇ·",
      "crg_cnt_code": "AM",
      "issue_date": null,
      "brs_ret_code": "PROVIS",
      "is_forensics_blacklisted": null,
      "monthlysalary": null,
      "activity_type": null,
      "@version": "1",
      "corporate_name": null,
      "id_type": 2,
      "forensics_status": 7,
      "receive_market_msgs": null,
      "crg_tit_id": 1,
      "first_name": "Karnig",
      "email": null,
      "issued_by": null,
      "contact_person": null,
      "no_of_employees": null,
      "bcc_cst_no": null,
      "employment": null,
      "vet_status": 6,
      "in_cus_acc_id": null,
      "bcc_cst_level": 1,
      "manager_name": null,
      "marital_status": 2,
      "bcc_cct_id": 6,
      "company_tax_code": null,
      "@timestamp": "2017-01-16T10:59:24.815Z",
      "modified_by": "SYSADMIN",
      "sub_info_no_reg": null,
      "family_name": "Balekjian",
      "age": null
    },
    "fields": 
    {
      "creation_date": [1135029600000],
      "@timestamp": [1484564364815],
      "birth_date": [132184800000],
      "modification_date": [1356511475000]
    },
    "sort": [1484564364815]
}

(Magnus Bäck) #16

"id": "%customerno",

Where are the curly braces?

When you're not posting your JSON as preformatted text (using the </> toolbar button) parts of the JSON text disappears, making it ambiguous and possibly leading to misunderstandings. I don't have time for misunderstandings so unless you start formatting the text as requested I won't spend more time in this thread.


(Abdullah Alaiwat) #17

Sorry for that, i am still beginner in JSON at all, hope this will get you what you need exactly:

{
  "_index": "bcc_customers",
  "_type": "record",
  "_id": "9026",
  "_score": null,
  "_source": {
    "company_type": null,
    "notes": "409100",
    "reg_no": null,
    "birth_date": "1962-04-16T22:00:00.000Z",
    "phone2": null,
    "channel": null,
    "phone_work": null,
    "employee_no": null,
    "cam_lan_id": null,
    "bcc_cty_id": 1,
    "type": "bcc_customers",
    "in_id": null,
    "manager_pass_id": null,
    "modification_date": null,
    "passport_expiry_date": null,
    "employer": null,
    "fax": null,
    "customer_address": null,
    "customer_no": 9026,
    "id_number": "AA0338434",
    "crg_prf_id": 1,
    "use_parent_address": 1,
    "vat": null,
    "creation_date": "2005-12-19T22:00:00.000Z",
    "valid_qs_status": 7,
    "created_by": "KHACHATRYAN",
    "tags": [],
    "father_name_2": "ê»ñÛáųÛÇ",
    "father_name": "Seryozha",
    "phone": "010 554000",
    "old_prof_no_mig": null,
    "corporate_name_2": null,
    "gender": 1,
    "family_name_2": "سÝáõÏÛ³Ý",
    "scale": null,
    "eduacation": null,
    "first_name_2": "ÐáíѳÝÝë»ë",
    "crg_cnt_code": "AM",
    "issue_date": null,
    "brs_ret_code": "PROVIS",
    "is_forensics_blacklisted": null,
    "monthlysalary": null,
    "activity_type": null,
    "@version": "1",
    "corporate_name": null,
    "id_type": 1,
    "forensics_status": 7,
    "receive_market_msgs": null,
    "crg_tit_id": 1,
    "first_name": "Hovhannes",
    "email": null,
    "issued_by": null,
    "contact_person": null,
    "no_of_employees": null,
    "bcc_cst_no": null,
    "employment": null,
    "vet_status": 6,
    "in_cus_acc_id": null,
    "bcc_cst_level": 1,
    "manager_name": null,
    "marital_status": 2,
    "bcc_cct_id": 1,
    "company_tax_code": null,
    "@timestamp": "2017-01-16T12:38:17.201Z",
    "modified_by": null,
    "sub_info_no_reg": null,
    "family_name": "Manukyan",
    "age": null
  },
  "fields": {
    "creation_date": [
      1135029600000
    ],
    "@timestamp": [
      1484570297201
    ],
    "birth_date": [
      -243309600000
    ]
  },
  "sort": [
    1484570297201
  ]
}

(Abdullah Alaiwat) #18

now its working, for the insert and the update it seems to be OK, but when i need to delete a record from DB it will not be deleted from the Kibana.


(Magnus Bäck) #19

Yeah, deletions are harder. I don't know if there's a better way than reindexing everything from the beginning, unless you can somehow capture the deletions as events and funnel them to Logstash.


(Abdullah Alaiwat) #20

reindexing everything from the beginning will cause a huge performance problem specially that our DB is very huge, any other suggestions? we are not able to capture the deletions as events neither as we cant use any procedures or objects in the statement part in the JDBC connection.