RDBMS vs ES for list data

PandKing · December 11, 2018, 1:21pm

Hi,

I need to audit a list of billions of filenames from 2 sources that "should" have the same number and filenames. I originally thought of elastic but I'm beginning to think this job would be better suited for an RDBMS.

Very small number of fields but very large list where an RDBMS may be more efficient with proper indices.
ATM only concerned that the filenames exist in both datasets and report on missing.

Love to hear some thoughts.
Thanks

nik9000 · December 11, 2018, 1:43pm

If I had two lists and wanted to find the differences between them then I'd sort them and then walk them in parallel checking. This is fairly common in RDBMSes where they'll call it a merge join. I'd reach for PostgreSQL personally. Or write a script. But whatever RDBMS you are comfortable with would do the job. I think MySQL will. I'm not 100% sure on that one actually. Years ago it didn't have the best query planning capabilities but I expect it is much better now.

system · January 8, 2019, 1:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to re-verify data consistency with external RDBMS source Elasticsearch	5	682	August 10, 2017
Comparing data from a RDBMS to Elasticsearch Elasticsearch	2	1224	September 22, 2017
Querying differences between indexes Elasticsearch	2	386	April 17, 2018
Difference b/w Normal DBs vs Elasticsearch Elasticsearch	9	424	October 12, 2022
Millions of documents -> add vs. update [aspect of performance] Elasticsearch	4	2523	July 5, 2017

RDBMS vs ES for list data

Related topics