RDBMS vs ES for list data


I need to audit a list of billions of filenames from 2 sources that "should" have the same number and filenames. I originally thought of elastic but I'm beginning to think this job would be better suited for an RDBMS.

Very small number of fields but very large list where an RDBMS may be more efficient with proper indices.
ATM only concerned that the filenames exist in both datasets and report on missing.

Love to hear some thoughts.

If I had two lists and wanted to find the differences between them then I'd sort them and then walk them in parallel checking. This is fairly common in RDBMSes where they'll call it a merge join. I'd reach for PostgreSQL personally. Or write a script. But whatever RDBMS you are comfortable with would do the job. I think MySQL will. I'm not 100% sure on that one actually. Years ago it didn't have the best query planning capabilities but I expect it is much better now.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.