Unique URL and query param patching


(Mark Evans) #1

I'm trying to solve this problem: Goal

To build a system that supports a highly concurrent & scalable URL matching algorithm.

I'm looking for tools, technologies, patterns & strategies.

Task

Imagine you have an InputURL:

www.awesomesite.com/product?color=pink&size=small&utm_campaign=google
In some data storage / structure / mechanism we might store the PossibleURLs:

record 1: www.awesomesite.com/product?size=small&color=red
record 2: www.awesomesite.com/product?size=small
record 3: www.awesomesite.com/product?size=small&color=blue
record 4: www.awesomesite.com/product?size=medium&color=blue
record 5: www.awesomesite.com/product?size=large&color=blue
record 6: www.awesomesite.com/product?size=small&country=us
record 7: www.awesomesite.com/product?size=small&color=pink
... millions more of variations.
I want to build a system that can take the InputURL and within a few ms, return back all the PossibleURLs where the InputURL contains all of the PossibleURL params.

In the scenario above, because the PossibleURL contain "size=small" & "color=pink" these records would have been matched:

record 2: www.awesomesite.com/product?size=small
record 6: www.awesomesite.com/product?size=small&country=us
record 7: www.awesomesite.com/product?size=small&color=pink
Constraints

High Scalability: Requests will be coming in at rates of 1000's to hundreds of thousands per second.
Speed: The matching needs to be done in < 10 ms
Huge Record Size: The # of PossibleRecords will grow to millions. However they will not always be from the same domain.

A friend suggested that it could be done using elastic search. Can I use ElasticSearch to handle the logic & Elastic.co to support to handle the infrastructure?


(Mark Walkom) #2

Is this specific to a found cluster you have or just a general ES question?


(Mark Evans) #3

I'm looking to Signup for Found, if its the right solution for the Job.I
believe it could be, but was hoping you guys could give me a little insight
as to whether or not ElasticSearch could be the tool and your platform
could facilitate the requirements I have.
From what i can tell, its possible, to do what i'm thinking with elastic
search, and we will probably need a decent size cluster for the level of
traffic and url's we have coming through our platform.

If it would help to jump on a call to answer questions and clarify our
needs let me know.

Thanks so much for your help,


(Mark Walkom) #4

Ok let me move this to the Elasticsearch category and we can discuss the core question first :slight_smile:


(Mark Evans) #5

Sounds good.


(system) #6