Hi,
I'm creating a (somewhat tricky) white-label matchmaking website (with Catalyst), and decided on using ElasticSearch. I am very new to ES, so I would appreciate if someone could give me a hint whether what I am planning to do is right or not, and what are the things I should watch out for (mostly regarding performance).
Very basically I have users, and users have attributes - e.g. age, location, weight, hair color, "do you like cats?", whatever. (About 1/3 of all attributes are searchable.) All attributes are numeric(!), but they are very dynamic, so they are configured via a CSV file (so that clients can define them with Excel), and stored in a MySQL BLOB column serialized in JSON. When a user is fetched, his/her data are deserialized to Perl objects.
I plan to use ES when users search for other users based on their preferences. Whenever a user modifies his/her data, I plan to push (searchable) attributes to ES. When a user issues a search, I just search for userid-s in ES, and then fetch user data from my local MySQL based on their userid-s.
I plan to store no more than 500k records in ES, each having less than 100 fields. Most likely the number of searches issued would be no more than 20/s, and the number of updates would be (much) less than 1/s. My experience with Sphinx shows these are low numbers, so I guess ES would easily serve my queries on a basic 4 core server, too. Is that correct?
I did a very basic performance test on my i3 desktop for 50k records, and what I got was about 500/s index operations and 1000/s get-s (based on id-s) and 500/s searches (based on binary equality). Do those numbers look OK, or am I doing something wrong? (I am using the ElasticSearch module from CPAN. Actually I'm guessing it takes more time to do the http transport than it takes solr to fetch the data.)
What do I have to watch out for (again, mostly performance-wise)? Most of the time I will be issuing queries like "a==2 and b==3 and c==1|2|3 and 1<d<10".
Thank you,
- Fagzal