I am working on a project where we will be modeling 1-n relations with
potentially hundreds of thousands of endpoints for a single
relationship.
We are currently storing the data in Riak, and this seems to work
fine. But the relations are of course a problem.
Riak links are implemented in a way that limits the number of links on
a single object to few thousands, so I figured I could use Riak Search
to search for relationship documents of the form "user_id: ...,
follower_id: ..."
However, it seems that Riak Search scales pretty badly when you get
hundreds of thousands of hits. Example: Counting the number of hits
when they are 100.000 takes 10 times as longs as counting the number
of hits when they are 10.000. Not too surprising, perhaps, but it
makes the count take several seconds for big relations.
So, I am thinking, can I do this with elasticsearch, or some other
lucene based engine?
This makes the question I would need an answer to this: Is counting
hits for a search having hundreds of thousands of hits in
elasticsearch a costly operation? Does it scale as linearly as in Riak
Search?
The more docs a query matches the slower things get.
The more hits you actually pull from the results (e.g. for displaying
or highlighting purposes) the slower things get.
I am working on a project where we will be modeling 1-n relations with
potentially hundreds of thousands of endpoints for a single
relationship.
We are currently storing the data in Riak, and this seems to work
fine. But the relations are of course a problem.
Riak links are implemented in a way that limits the number of links on
a single object to few thousands, so I figured I could use Riak Search
to search for relationship documents of the form "user_id: ...,
follower_id: ..."
However, it seems that Riak Search scales pretty badly when you get
hundreds of thousands of hits. Example: Counting the number of hits
when they are 100.000 takes 10 times as longs as counting the number
of hits when they are 10.000. Not too surprising, perhaps, but it
makes the count take several seconds for big relations.
So, I am thinking, can I do this with elasticsearch, or some other
lucene based engine?
This makes the question I would need an answer to this: Is counting
hits for a search having hundreds of thousands of hits in
elasticsearch a costly operation? Does it scale as linearly as in Riak
Search?
If I don't pull all hits, just paginate through the first bit, and count
them, I am basically doing what I do when I google something with millions
of hits. I wonder how Google does it so fast...
On Feb 24, 2012 2:33 AM, "Otis Gospodnetic" otis.gospodnetic@gmail.com
wrote:
Hi Martin,
The more docs a query matches the slower things get.
The more hits you actually pull from the results (e.g. for displaying
or highlighting purposes) the slower things get.
I am working on a project where we will be modeling 1-n relations with
potentially hundreds of thousands of endpoints for a single
relationship.
We are currently storing the data in Riak, and this seems to work
fine. But the relations are of course a problem.
Riak links are implemented in a way that limits the number of links on
a single object to few thousands, so I figured I could use Riak Search
to search for relationship documents of the form "user_id: ...,
follower_id: ..."
However, it seems that Riak Search scales pretty badly when you get
hundreds of thousands of hits. Example: Counting the number of hits
when they are 100.000 takes 10 times as longs as counting the number
of hits when they are 10.000. Not too surprising, perhaps, but it
makes the count take several seconds for big relations.
So, I am thinking, can I do this with elasticsearch, or some other
lucene based engine?
This makes the question I would need an answer to this: Is counting
hits for a search having hundreds of thousands of hits in
elasticsearch a costly operation? Does it scale as linearly as in Riak
Search?
If I don't pull all hits, just paginate through the first bit, and count them, I am basically doing what I do when I google something with millions of hits. I wonder how Google does it so fast...
The more docs a query matches the slower things get.
The more hits you actually pull from the results (e.g. for displaying
or highlighting purposes) the slower things get.
I am working on a project where we will be modeling 1-n relations with
potentially hundreds of thousands of endpoints for a single
relationship.
We are currently storing the data in Riak, and this seems to work
fine. But the relations are of course a problem.
Riak links are implemented in a way that limits the number of links on
a single object to few thousands, so I figured I could use Riak Search
to search for relationship documents of the form "user_id: ...,
follower_id: ..."
However, it seems that Riak Search scales pretty badly when you get
hundreds of thousands of hits. Example: Counting the number of hits
when they are 100.000 takes 10 times as longs as counting the number
of hits when they are 10.000. Not too surprising, perhaps, but it
makes the count take several seconds for big relations.
So, I am thinking, can I do this with elasticsearch, or some other
lucene based engine?
This makes the question I would need an answer to this: Is counting
hits for a search having hundreds of thousands of hits in
elasticsearch a costly operation? Does it scale as linearly as in Riak
Search?
If I don't pull all hits, just paginate through the first bit, and count
them, I am basically doing what I do when I google something with millions
of hits. I wonder how Google does it so fast...
On Feb 24, 2012 2:33 AM, "Otis Gospodnetic" otis.gospodnetic@gmail.com
wrote:
Hi Martin,
The more docs a query matches the slower things get.
The more hits you actually pull from the results (e.g. for displaying
or highlighting purposes) the slower things get.
I am working on a project where we will be modeling 1-n relations with
potentially hundreds of thousands of endpoints for a single
relationship.
We are currently storing the data in Riak, and this seems to work
fine. But the relations are of course a problem.
Riak links are implemented in a way that limits the number of links on
a single object to few thousands, so I figured I could use Riak Search
to search for relationship documents of the form "user_id: ...,
follower_id: ..."
However, it seems that Riak Search scales pretty badly when you get
hundreds of thousands of hits. Example: Counting the number of hits
when they are 100.000 takes 10 times as longs as counting the number
of hits when they are 10.000. Not too surprising, perhaps, but it
makes the count take several seconds for big relations.
So, I am thinking, can I do this with elasticsearch, or some other
lucene based engine?
This makes the question I would need an answer to this: Is counting
hits for a search having hundreds of thousands of hits in
elasticsearch a costly operation? Does it scale as linearly as in Riak
Search?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.