With the option : document_id => "%{id}"
The existing entries in Elasticsearch with the same ID will be overwrite when i update data on mysql.
but when i delete data form mysql, those ES entries won't be removed.
How can i remove those old data in ES which have been deleted on mysql.
thank you!
I have one question about what you said above:
when I use sql like above
"where tablename.update_time > :sql_last_value"
The below bug appeard:
Exception when executing JDBC query {:exception=>#<Sequel::DatabaseError: Java::ComMysqlJdbcExceptionsJdbc4::MySQLSyntaxErrorException: Unknown column 'cc.update_time' in 'where clause'>, :level=>:warn}
Java::ComMysqlJdbcExceptionsJdbc4::MySQLSyntaxErrorException: Unknown column 'cc.update_time' in 'where clause': SELECT count(*) AS count FROM (select
*
cc is my table name, update_time is not exist, can you help me?
thanks !
First, you know, In mysql we just use delete from * to delete some date we don't use.
Second, We want to sync the delete information, as you say, "add a "status" column", what you say is
in mysql, just add a status column to indetify the date is ok or not, just like:
mysql> select * from cc;
+----+--------------------+--------+
| id | name | status |
+----+--------------------+--------+
| 1 | laoyang360 | ok |
| 4 | intok | ok |
| 5 | jdbc_test_update08 | ok |
| 7 | test7 | ok |
| 8 | test008 | ok |
| 9 | test9 | ok |
| 10 | test10 | ok |
+----+--------------------+--------+
7 rows in set (0.00 sec)
mysql>
mysql>
mysql> update cc set status = 'deleted' where id = 10;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from cc;
+----+--------------------+---------+
| id | name | status |
+----+--------------------+---------+
| 1 | laoyang360 | ok |
| 4 | intok | ok |
| 5 | jdbc_test_update08 | ok |
| 7 | test7 | ok |
| 8 | test008 | ok |
| 9 | test9 | ok |
| 10 | test10 | deleted |
+----+--------------------+---------+
7 rows in set (0.00 sec)
2.In elasticsearch, revise the data status, I think in elasticsearch revise is very difficult.
you can add a timestamp column, as you modify the status the timestamp value will be updated, Using paramter tablename.timestamp > :sql_last_value", logstash will only update the entry which timestamp has been updated.
Thanks. I already added timestamp column named modified_at, just like:
mysql> select * from cc;
+----+--------------------+---------+---------------------+
| id | name | status | modified_at |
+----+--------------------+---------+---------------------+
| 10 | test10 | deleted | 0000-00-00 00:00:00 |
| 11 | test1111 | ok | 2016-06-23 04:10:00 |
| 12 | test012 | ok | 2016-06-23 04:21:56 |
+----+--------------------+---------+---------------------+
and the sql is
[root@5b9dbaaa148a logstash_jdbc_test]# cat jdbc.sql
select
*
from
cc
where cc.modified_at > :sql_last_value
and the result is
{"id":11,"name":"test1111","status":"ok","modified_at":"2016-06-23T03:10:00.000Z","@version":"1","@timestamp":"2016-06-23T04:10:00.185Z","type":"cc_type"}
{"id":12,"name":"test012","status":"ok","modified_at":"2016-06-23T03:21:56.000Z","@version":"1","@timestamp":"2016-06-23T04:10:00.187Z","type":"cc_type"}
I get that you need to flag / log the delete operation and I get that done via triggers in mysql.
But how can I instruct the logstash-jdbc-input plugin to delete the given document ids ??
Seems a simple task for the plugin no?
Thanks
Chris
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.