What code should I place into logstash.conf to automatically delete records with certain unique IDs from certain ES index?

Mikhail · April 16, 2018, 1:26pm

Hi,

I have an ES index that is filled with data by logstash with use of jdbc driver. Data source is SQL table1.
Also, I have a SQL table2 that stores list of IDs that were deleted from SQL table1.
And now I want to tell logstash that within each "sync" session (I mean sync between SQL table1 and ES index) he should get list of IDs from table2, then find documents with these IDs in ES index and delete them in ES index.

What code should I place into logstash.conf to automatically delete records from certain ES index as described above?

magnusbaeck · April 16, 2018, 2:35pm

Logstash processes filters in the configured order but apart from that it has no notion of sequenced actions.

Mikhail · April 16, 2018, 2:50pm

@magnusbaeck
Thanks for quick reply. In that case, is there any other way how to delete documents from index by source list of IDs?
Maybe this way:

Create index2 that will be synced with the SQL table2 that stores deleted IDs
Perform deletion on ES-level. For example, somehow automatically get the list of IDs from index2 and automatically delete corresponding documents from index1.
Is it possible? How to configure this?

magnusbaeck · April 16, 2018, 4:46pm

I don't see how your idea would solve the sequencing problem. At this point I don't have any good suggestions.

Mikhail · April 17, 2018, 11:06am

@magnusbaeck
Sorry, I have made a mistake in tables numbering and this might brought incorrect understanding of what am I trying to do. I have corrected my first post above.
So, I want logstash to do the following:

get the list of IDs from SQL table2
find documents with such IDs in index1
delete found documents from index1

From my point of view, sequencing is not important here. I have two parallel transfers:

Regular incremental (using sql_last_value) data transfer from SQL table1 to index1
Regular deletion of data from index1 by list of IDs supplied from SQL table2

If Logstash will try to delete document with some ID that will not exist in index1 yet then it is not a problem. He will be able to do this successfully within one of the next sync sessions because list of deleted IDs is supposed to be captured on non-incremental basis i.e. always "select * from table2".

According to clarifications above, do I have a chance to configure logstash to reach my goal?

magnusbaeck · April 17, 2018, 1:10pm

Oh. In that case it sounds like all you need is a jdbc input and an elasticsearch output with action => "delete". If the id obtained from the SQL table isn't identical to the id of the ES document you need an elasticsearch filter as well. What kind of volumes are we talking about? Looking up every single event in ES is of course not very efficient.

Mikhail · April 18, 2018, 10:13am

@magnusbaeck
Sorry for late answer. IDs in SQL table1, SQL table2 and ES index1 - are identical. In SQL tables they have bigint format and in index1 they have "type": "long".
As for deletion,
SQL table1 will store smth about 100 millions of rows
ES index1 will store the same amount as SQL table1 i.e. smth about 100 millions of documents
SQL table2 will store smth about 100k IDs and once in halfyear can store 10 millions IDs.

So, delete operation usually will require from logstash to get smth about 100k IDs from SQL table2, to find them in ES index1 among 100 millions documents and to delete found documents from ES index1. As for halfyear peak, I understand that logstash transfer will require much more time than usual, and I'm OK with that.

system · May 16, 2018, 10:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The best way to automatically delete data from certain index by list of IDs Logstash	2	412	April 9, 2018
Logstash config to delete records in elastic? Logstash	3	2376	July 6, 2017
Logstash to delete documents in a particular index each time it is run Logstash	4	483	May 16, 2018
Best way to delete ElasticSearch document using Logstash configuration Logstash	2	5754	August 12, 2015
Delete elasticsearch document with logstash-jdbc-input Logstash	13	16561	July 6, 2017

What code should I place into logstash.conf to automatically delete records with certain unique IDs from certain ES index?

Related topics