Please suggest me right approach to solve this scenario ….
I have a DB2 table called "part" to store all individuals points of interest alongside address subtle elements (segment name, for example, , street_address_1, street_address_2 , Zip_code, state , nation). Presently I have discover part who are in a similar address , considering they are the individual from same family unit.
Yet, the issue is that , address is a content and address given by the part are not in same arrangement , somebody given address as "ABC Apartment , Flat 303, 12 , New York City" where as somebody given an indistinguishable address from "ABC Appt. , Flat 303, 12 , NYC "
In my rationale I need to consider both are same and this two individuals are having a place with a similar house hold.
In DB2 part table I have around 10M information. I am considering to utilize Spark+scala with SoundEX API. Be that as it may, I figure I can utilize flexible hunt likewise with assemble by + fluffy rationale , But I don't know how this should be possible where information is in DB2 table as I never take a shot at versatile pursuit.
In Spark based approach likewise I need to dump 10M information in Hadoop condition , and have look at one by one for 10M information with soundEX encode esteem , which I think will be extremely tedious and not a brilliant method to approach this situation and start does not have coordinate help additionally for Fuzzy execution.
Can any one recommend me the correct approach for this situation alongside the procedure.