Klout Search Powered by ElasticSearch, Play Framework, Scala and Akka


(felipera) #1

Hi everyone,

This week I released Klout's search using ElasticSearch and wrote a blog
post about it:
http://geeks.aretotally.in/klout-search-elasticsearch-play-framework-scala-akka

I hope you like it!

Thank you,
Felipe [http://twitter.com/_felipera]


(phobos182) #2

The article is interesting, but light on details. How many documents? What is being indexed / searched? How many shards / servers / etc...?


(felipera) #3

Hi Jeremy,

A little under 150 million documents. We are indexing topics (not a big
dataset), most of the dataset.

10 servers, moving to 15 soon, 20 shards, a little over 1tb of indexed but
that number should increase soon.

On the website we are not doing very complicated searches at this time. On
our internal tool for Klout Perks we have pretty complicated searches based
on a bunch of criterias like different values generated by our algorithms
associated with each user, topics, geo location, etc. We also do some data
exports using scrolling searches, mgets, etc. We should be adding more
criterias to the public website sometime soon as well.

The internal tool has been using ElasticSearch for a while, it powers most
of the eligiblity checks for Klout Perks but the public facing search has
just been released recently.

Hope that clarifies some of your questions.

Thank you,
Felipe


(Diptamay) #4

Hi Felipe

Thanks for sharing. I am curious to why you chose Play! over Node.js which
is driving your main site. Also a bit confused from your article on how is
Play! and Akka fitting into the whole equation. ES already does
scatter-gather searching, so wondering how is Akka actors helping out.
Wouldn't Node.js and ES would have been good enough?

Thanks
Diptamay

On Tue, Dec 13, 2011 at 10:48 PM, Felipe Oliveira felipera@gmail.comwrote:

Hi Jeremy,

A little under 150 million documents. We are indexing topics (not a big
dataset), most of the dataset.

10 servers, moving to 15 soon, 20 shards, a little over 1tb of indexed but
that number should increase soon.

On the website we are not doing very complicated searches at this time. On
our internal tool for Klout Perks we have pretty complicated searches based
on a bunch of criterias like different values generated by our algorithms
associated with each user, topics, geo location, etc. We also do some data
exports using scrolling searches, mgets, etc. We should be adding more
criterias to the public website sometime soon as well.

The internal tool has been using ElasticSearch for a while, it powers most
of the eligiblity checks for Klout Perks but the public facing search has
just been released recently.

Hope that clarifies some of your questions.

Thank you,
Felipe


(felipera) #5

Play/Scala/Akka were already powering a couple of other things at Klout
like Perks' backend and admin interfaces so it was an easy choice. Play
over Node.js is also a matter of expertise on the different teams we
have. The Play/Scala applications also need to do some other things like
querying data from HBase so just using the native JVM-friendly drivers make
a lot of sense.

Initially I wanted to have the REST api running on ES as a plugin, similar
to http://www.elasticsearch.org/tutorials/2011/09/14/creating-pluggable-rest-endpoints.html.
But I found it to be unstable under load; I will be honest and I am not
sure if the problem was with my configuration. Since I have deployed a few
applications with Play querying ES using the native Java api I gave it a
shot and that fixed all the issues I was having.

Does that answer your question?


(felipera) #6

Little typo there, we are indexing topics (not a big dataset) and user
profiles (most of the dataset).


(Diptamay) #7

Yep. Got it. Thanks for the quick reply.

So is this live now? Or I guess I don't have enough Klout to see my perks
yet :). Was trying to see an actual business use case in action.

-Diptamay

On Wed, Dec 14, 2011 at 12:04 AM, Felipe Oliveira felipera@gmail.comwrote:

Play/Scala/Akka were already powering a couple of other things at Klout
like Perks' backend and admin interfaces so it was an easy choice. Play
over Node.js is also a matter of expertise on the different teams we
have. The Play/Scala applications also need to do some other things like
querying data from HBase so just using the native JVM-friendly drivers make
a lot of sense.

Initially I wanted to have the REST api running on ES as a plugin, similar
to
http://www.elasticsearch.org/tutorials/2011/09/14/creating-pluggable-rest-endpoints.html.
But I found it to be unstable under load; I will be honest and I am not
sure if the problem was with my configuration. Since I have deployed a few
applications with Play querying ES using the native Java api I gave it a
shot and that fixed all the issues I was having.

Does that answer your question?


(felipera) #8

In regards to ES use on Klout only the user and topics search is opened right now. The rest is used internally, we'll improving the public search soon.

Cheers!
Felipe


(system) #9