Delete docs via JDBC importer

Gang · March 24, 2017, 10:56am

Hi everyone. Would anyone advise how to delete a document through the jdbc importer? I use it to download documents from the Database. But when the object in the database is marked as deleted, it should no longer fall into the search. I certainly can add conditions to the search, but it will be a constantly working additional filter. So I'm looking for the possibility of deleting such documents. Can you help?

jprante · March 24, 2017, 12:56pm

Note, this is not really Elasticsearch related, you could also open an issue at my project page http://github.com/jprante/elasticsearch-jdbc

In JDBC importer, you can use the pseduo-column name _optype and set it to delete.

Example:

select ..., "delete" as _optype from ...

I do not recommend this in general because it may have dramatic effects on index segment organization, and because of deleted document markers in the segments, it mostly will require extra segment compaction from time to time via the forcemerge operation.

If there are many deleted documents, reindexing or timestamp-based index organization is much better.

Search filter are greatly optimized. It is not slower and more convenient to use a search filter.

Gang · March 24, 2017, 1:05pm

Thanks for the answer. I remember the query filter as a last resort. But I have few such documents, so I'm not afraid of fragmentation. For me, the bigger problem is the appearance of these documents in the sample.

Gang · March 29, 2017, 11:00am

Hello again! Can you please tell me where I can read about other options for _optype and how they work.

Btw faced with problem fix the number of threads continue to grow problem by cxfly · Pull Request #944 · jprante/elasticsearch-jdbc · GitHub last night. It was cool to have a patch for it, but dont you paln to update the release. It seems pretty dated for now.

jprante · March 29, 2017, 10:31pm

The pull request is just a small work around, there is a 5.x branch of JDBC importer with completely rewritten code, which is half ready, and not feature complete.

You are correct, release state for 2.x is pretty dated. I don't have enough time to spend on my leisure time projects, only from time to time. Currently I do not plan to do any 2.x based releases any more, instead I want to focus on 5.x. Maybe a last bug fix release.

jprante · March 29, 2017, 10:42pm

I did not document it. The reason is that mixing _optype in ES 2.x bulk actions does not always work well, you may lose data in ES index unintentionally etc. so I really do not recommend using it. If JDBC importer was a tool to lose data, I could be made responsible for data loss and this would be dramatic. I also tried to block SQL delete statements by making it difficult to not use a read-only DB connection etc. for good reason, only to protect existing data.

system · April 26, 2017, 10:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.