I'm trying to solve this problem: Goal
To build a system that supports a highly concurrent & scalable URL matching algorithm.
I'm looking for tools, technologies, patterns & strategies.
Task
Imagine you have an InputURL:
www.awesomesite.com/product?color=pink&size=small&utm_campaign=google
In some data storage / structure / mechanism we might store the PossibleURLs:
record 1: www.awesomesite.com/product?size=small&color=red
record 2: www.awesomesite.com/product?size=small
record 3: www.awesomesite.com/product?size=small&color=blue
record 4: www.awesomesite.com/product?size=medium&color=blue
record 5: www.awesomesite.com/product?size=large&color=blue
record 6: www.awesomesite.com/product?size=small&country=us
record 7: www.awesomesite.com/product?size=small&color=pink
... millions more of variations.
I want to build a system that can take the InputURL and within a few ms, return back all the PossibleURLs where the InputURL contains all of the PossibleURL params.
In the scenario above, because the PossibleURL contain "size=small" & "color=pink" these records would have been matched:
record 2: www.awesomesite.com/product?size=small
record 6: www.awesomesite.com/product?size=small&country=us
record 7: www.awesomesite.com/product?size=small&color=pink
Constraints
High Scalability: Requests will be coming in at rates of 1000's to hundreds of thousands per second.
Speed: The matching needs to be done in < 10 ms
Huge Record Size: The # of PossibleRecords will grow to millions. However they will not always be from the same domain.
A friend suggested that it could be done using elastic search. Can I use ElasticSearch to handle the logic & Elastic.co to support to handle the infrastructure?